JOSEPH W. SIMMONS and MARK J. GUTTMANN 
La Salle College 


STATES, WAVES AND PHOTONS: 
A Modern Introduction to Light 


A 
vv 


ADDISON-WESLEY PUBLISHING COMPANY 
Reading, Massachusetts - Menlo Park, California > London * Don Mills, Ontario 


This book is in the 
ADDISON-WESLEY SERIES IN PHYSICS 


Consulting Editor: DAVID LAZARUS 


Copyright © 1970 by Addison-Wesley Publishing Company, Inc. 
Philippines copyright 1970 by Addison-Wesley Publishing Company, Inc. 


All rights reserved. No part of this publication may be reproduced, stored in a retrieval 
system, or transmitted, in any form or by any means, electronic, mechanical, photo- 
copying, recording, or otherwise, without the prior written permission of the publisher. 
Printed in the United States of America. Published simultaneously in Canada. Library of 
Congress Catalog Card No. 73-102998. 


To Henry J. Bolger, C.s.C., 1901-1964 
Chairman, Department of Physics, University of Notre Dame, 1936-1963 


If you were there, you understand why . 


PREFACE 


It is the nature of any realm of human endeavor to establish what must be 
called a body of accomplishment. For example, the paintings of Rembrandt 
and the mechanics of Newton, respectively, belong to the realms of art and 
physical science. Few, if any, would criticize the viewpoint that these 
achievements should rightfully be called art and science. To put this another 
way, any attempt to define art or science would have to be broad enough to 
include Rembrandt’s paintings under the term “art” and the laws of Newton 
under the term "physics" and, except in some emotional sense, to exclude 
them from the other category. Inevitably, such a body of accomplishment 
expands, usually in quite unpredictable ways. These ways are such that what 
was once a good delineation of a field becomes inadequate. Two hundred 
years ago it would perhaps have been possible to give a definition of art which 
would completely exclude the whole of modern nonrepresentational painting. 
Yet, whether we like these paintings or not, it is generally considered that 
this modern art is indeed art. The point to be taken from the above is that 
there is a quite definite relation between the accomplishments made in a given 
field and the definition of that field. To be concrete, what physics is and what 
physicists do must always be a question which cannot be completely answered 
except by hindsight. 

Most intermediate optics texts relegate to a final chapter or two those 
aspects of physics which have forced us in this century to alter drastically 
our “definition” of physics. As a result, after having been introduced to the 
photon concept and quantum mechanics in either atomic physics or a 
quantum mechanics course, the student does quite well until a nagging 
thought appears: “What happened to the electromagnetic wave?" Part of 
the pedagogic difficulty may be that the student has formed an incomplete 
definition of physics in general and optics in particular. If the student 
receives a thoroughly rigorous course in classical optics, he emerges feeling 


v 


vi PREFACE 


that what he sees is an electromagnetic wave. Certainly, after exposure to 
the triumphs of the classical theory and its experimental confirmation, it is 
awkward for him to start again with a photon description, leaving no bridge 
except the correspondence principle and some mention of the existence of 
quantum electrodynamics. The present text is an attempt to solve this 
pedagogic problem by an a priori inclusion of quantum effects as a funda- 
mental part of “optics”. Such an attempt leads naturally to the basic formal- 
ism of quantum mechanics within the confines of a two-state system. This 
formalism is exploited in Chapters 2 through 5 for two principal reasons: 


1) it is a valid description of the photon as a quantum mechanical entity; 
2) it introduces the student to a nomenclature and approach that character- 
ize much of the physics of our time. 


The latter reason is not insignificant for those engaged in undergraduate 
teaching. Such teachers are well aware of the difficulty of devising curricula 
that are in touch with the physics of their generation. For example, the space 
devoted to the representation and production of polarization states might 
seem disproportionately large, unless understood within the framework of an 
undergraduate curriculum. 

The physics programs at most schools are rather different from those of 
a decade ago. This must be, but it is well known that the upper division 
course in optics has often been a casualty. However, after 6 years of utilizing 
the approach presented here, the authors have found the optics course trans- 
formed from a luxury to a dynamic and integral part of the curriculum. At 
present this text forms the basis for a one-semester course (three hours per 
week plus laboratory work) given in the second semester of the junior year 
at La Salle. There is more than enough material in this book for such a 
course, and a selection must be made. Although other choices are certainly 
possible, our method of presenting the material may be of interest. The 
material of Chapter 1 is covered in the third semester of the general physics 
course and the junior course begins with Chapter 2 and the understanding 
that the student is familiar with the elements of matrix algebra. The students 
also have had a strong course in electromagnetic theory. To accomplish the 
central purpose of the course (as outlined in the following paragraph) within 
the allotted time, we have generally omitted Sections 6.4 and 6.5, on the 
Cornu spiral, and Sections 7.9 through 7.16, dealing with coherence. With 
the exception of the detailed calculations in Appendix VI, the remaining 
material in the text is covered without difficulty. 

The principal motivation which guided us in the writing of the text was 
to show how polarization states, the electromagnetic field, and the spin of 
the photon are interrelated. Thus, an abrupt change of pace is necessary in 
Chapter 7 when the classical electromagnetic field is introduced so that it can 


PREFACE vii 


later be related to the angular momentum of a photon in a K-space repre- 
sentation in Chapters 8 and 9. Then, coming full circle, the eigenstates of 
the spin operators for the electromagnetic field in K space are the ket vectors 
of the Jones calculus of the early chapters. 

By the very nature of its approach, the text serves the purpose of an 
introduction to the concepts and techniques of quantum mechanics. For this 
reason it is subtitled as *A Modern Introduction to Light" rather than “An 
Introduction to Modern Optics". The distinction is nontrivial and explains 
the absence of many important topics, e.g., optical instruments, computer 
lens design, nonlinear optics, etc. 

Finally, the authors would like to thank for their suggestions and criti- 
cisms the many students who have suffered through preliminary versions of 
this text. 


Philadelphia, Pa. J. W. S. 
February 1970 M. J. G. 


Chapter 1 
1.1 


Matrix Optics 


A statement of the problem . 

The matrix operators . 

The S- and V-matrices 

Significance of the individual elemens s Sij 
The principal planes and lens equations 
The nodal points 

Summary 


The Experimental Evidence 


The quantitative measurement of light and quanta 
Polarizing filters 

States and prediction . 

Superposition of states 

E-states . 

Kets 

The linear and angular momenta ‘of light 
Anisotropic retarders . 

Polarization by absorption 

Optical activity 


The Representation and Production of States 


The vector and matrix representation 

The bracket product and probability amplitudes 
The superposition of P-states 

Retardation plates and their matrix representation 
An elliptical polarizer . 

Transmission matrices in other bases 

Optical activity 
Dichroic polarizers 


CONTENTS 


X CONTENTS 


Chapter 4 


5.1 


6.1 


74 


The Stokes Parameters 


The projection operator 

The Stokes parameters 

The Mueller matrices . 

Probability amplitudes in the Stokes formalism 
The Poincaré sphere 

The sigma matrices as instrument operators 
Partial polarization 


The Methods of Quantum Mechanics 


Eigenstates and eigenvalues 

Hermitian operators 

A class of experiments 

Expectation values : 
The postulates of quantum mechanics i 
The time operator and commutation 
An example of a time operator . 

The linear momentum operator . 

The time dependence . 

Functions as vectors j 
Continuous superposition of states š 
Spatial localization 

Temporal localization . 

Commutation and uncertainty 


Diffraction and Image Formation 


Green's theorem method . 
Apertures in plane screens 
Fresnel diffraction . 

The Cornu spiral T 

The straight edge and slit 
Fraunhofer diffraction 

The rectangular aperture 

The diffraction grating 

The circular aperture . 

Image formation in coherent light 
Spatial filtering in coherent illumination 


The Electromagnetic Field and Coherence 


The classical field . . : 
The wave equation and the speed of light : 


Conservation of energy in the electromagnetic field 


Conservation of linear momentum . 
Plane wave solutions . 


Chapter 8 


8.1 
82 
8.3 
8.4 
8.5 


Chapter 9 


Appendix I 


CONTENTS 


The polarization of plane waves . 
The laws of reflection and refraction 
Fresnel’s equations 

The complex field . 

The pseudomonochromatic case . 
Coherence . 

Measurement of 712 " 
Complex correlation and incoherent sources 
The Michelson stellar interferometer 
Coherence and polarization 
Intensity fluctuations . 


The Momentum Space Representation 


The time dependence of the state function . 
Momentum representation of classical particles 
Maxwell's equations in momentum space 

The field variables in K-space 

The state function in K-space 


Angular Momentum 


Angular momentum of the electromagnetic field 
The angular momentum operators 

Angular momentum eigenvalues . 

Orbital angular momentum oj 

The angular momentum of the photon 

The pure states of the free photon 


Matrix Algebra 


Appendix IE Jones and Mueller Matrices 


Appendix III Conventions for Polarization States . 


Appendix IV The Dirac Delta Function 


Appendix V 


Appendix V 


Useful Vector Identities 


I Momentum Space Representations of Linear Momentum 
and Angular Momentum 


Appendix VII Integrals of the Gaussian Distribution . 


Bibliography 


Index 


xi 


177 
182 
184 
188 
189 
190 
192 
194 
197 
198 
201 


207 
210 
212 
213 
215 


222 
231 
233 
235 
237 
240 
245 
252 
256 
258 


263 


265 


271 


274 


277 


CHAPTER 1 


MATRIX OPTICS 


Geometric optics is the traditional beginning for a study of optics. There 
are a number of reasons, more or less compelling, for following this tradition. 
If the student has been previously exposed to geometric optics, he will be 
meeting familiar material within the framework of an aesthetically satisfying 
conceptual scheme. Furthermore, this venerable area of optics is an ideal 
introduction to the matrix algebra techniques which are often utilized in this 
book. Last but least, as mentioned above, this is the way optics books usually 
start. 

The prerequisites required for mastery of the material in this chapter are 
slight. It is assumed that the student is familiar with the rectilinear propaga- 
tion of light, the law of reflection, and Snell's law. The matrix algebra 
required is not more advanced than that given in Sections 1 and 3 of 
Appendix I. 


1.1 A STATEMENT OF THE PROBLEM 


The problem that will be considered in this chapter is illustrated in Fig. 1.1. 
Between the wavy lines is some optical system, S, which consists of various 
reflecting and refracting surfaces. System S is such that it possesses a rota- 
tional axis of symmetry, which will be called the system axis or optic axis. 
The intersection of the optic axis with any of the reflecting or refracting 
boundaries is termed the vertex of the boundary. 

Now consider a ray which passes above point 1 on the axis before entering 
the system and after emerging from S passes over point 2. The entrant ray 
and the emergent ray are each defined by their height above the axis and 
their inclination relative to the horizontal. We will assume that there exists 
a linear relation between the entrant and emergent rays of the form 


Ja = 5uyi + 51205, (1.1) 
05 = S211 + S2201, 


1 


2 MATRIX OPTICS 1.1 


optic axis 


Fig. 1.1. Entrant and emergent rays for an optical system. 


where the coefficients s; depend only on the system S and not on the par- 
ticular ray. Later it will be seen that this is equivalent to assuming that all 
angles are such that the approximation 


sin 0 = tan 0 = 0 (1.2) 


may be made. This assumption is known as the paraxial approximation, 
and geometric optics in this approximation is called Gaussian optics. 
Equation (1.1) may conveniently be written as 


|Rə) = S|R;), (1.3) 


where 


EVA ex fart — [$u 512 
Ra) = (i) IR) = n). in D, E 

Following the notation of Appendix I, the symbol |R) indicates a column 
vector. The information about the entrant and emergent rays has been 
included in the column vectors |R;) and |R;), and all pertinent information 
about the system has been isolated in the 2 x 2-matrix S. 

If we have a composite system as shown in Fig. 1.2, then in the paraxial 
approximation we have 


|Ro) = S4| R3). |Rs) = Sal Ro), 
and 
| Rs) = S,S,|R;). 

Thus, if the individual system matrices are known, the matrix of the composite 
system is the product of the component system matrices. Consequently, if 
we are to represent any optical array by a matrix, we must find matrices that 
represent the simplest possible subsystems of such an array. This amounts 
to determining the matrices that represent refraction, reflection, and transla- 
tion. When we have these elementary matrices, any Gaussian array may be 


NE UK 


Fig. 1.2. A composite optical system. 


2l THE MATRIX OPERATORS 3 


Fig. 1.3. Simple translation of a ray. 


represented as a product of the elementary matrices. The matrix elements 
of this product can then provide information about focal lengths, image- 
object relations, etc. The determination of the matrix for an optical system 
and the extraction from the matrix elements of all pertinent information are 
the problems to be examined in this chapter. 


1.2 THE MATRIX OPERATORS 


The elementary matrix operators mentioned in the last section will now be 
derived in the paraxial approximation. It is assumed that reflecting and 
refracting surfaces possess rotational symmetry and are spherical, and that 
the refracting media are homogeneous. 


a) The Translation Matrix 


It is a simple matter to construct a matrix which describes a light ray in a 
homogeneous medium. It is evident from Fig. 1.3 that the relation between 
the coordinates of the ray above points 1 and 2 are 


Vo = yi + t tan 0, 05 = 94. (1.4) 


This relation is not linear, since tan 0 appears rather than 0. These equations 
can, however, be put in the form of the Gaussian approximation if we limit 
ourselves to those rays for which 0 is small enough to use Eq. (1.2). If this 
is done, Eqs. (1.4) may be written in compact matrix form, where the trans- 
lation matrix T5; is given by 


Ta = ic We (1.5) 


with f considered as a positive quantity. 

Here, and in all that follows, heights above the axis are considered as 
positive and angles are taken as positive if the ray has a positive slope, i.e., 
if y increases from left to right. 

In obtaining the matrix T5, no consideration was given to the direction 


4 MATRIX OPTICS 1.2 


in which light was travelling along the ray, and Ts, correctly relates the ray 
at point 2 to that at point 1 regardless of the direction in which light travels 
along the ray. If |R;) is known and |R,) is desired, the equation |R2) = 
To: | R1) may be inverted to yield |R,) = Tx! |Rə). The inverse matrix is 
computed to be 


(| -—t 
You = P 4i (1.6) 
Thus, the form of Eq. (1.5) may be used to relate any two points i and j 
separated by a distance d with ' 


Tu= (o 1). a7 


where d is positive if point j is to the left of point i and negative if it is to 
the right. 


b) The Refraction Matrix 


In Fig. 1.4 two distinct regions are separated by a spherical boundary whose 
radius of curvature is r. As usual, the index of refraction, n, is defined as 
the ratio of the velocity of light in vacuo, c. to the velocity of light in the 
medium, v: n = c[r. The two regions are characterized by their indices of 
refraction, nı and n. It is also convenient to define a relative index of 
refraction 75 by 12 = n;[n;. Since the radius r is the normal direction at 
the point of refraction, we write, using Snell's law, 


n, Sin dy = ng sin do, (1.8) 
which in the Gaussian approximation becomes 
Md, = Nobo. (1.9) 
From Fig. 1.4 we see that 
di— yc. z= y+ Oe. (1.10) 


Fig. 1.4. Refraction of a ray at a spherical surface. 


1.2 THE MATRIX OPERATORS 5 


To the same approximation, we have y — yK, where the curvature of the 
surface is defined as K = 1/r. Since the height of the ray does not change in 
crossing the boundary, y = y, = y; and Eqs. (1.8), (1.9), and (1.10) may be 
solved to yield 


TIN (1.1) 
ba = (mis — DE» + n01. 
If ọ is small, then ô may be neglected. 
If |R,) and |R) are the column vectors that represent the incident and 
the refracted rays, we can write 


| R3) = Rai |Ri), (1.12) 
where 


Ra = bo fe bar (1.13) 


We have assumed that the curvature is positive when the surface is curved 
as shown, i.e., convex when viewed from the left. In practice it is best to 
enter all curvatures as positive symbols and to enter positive or negative 
values only in calculations. 

As in the development of the translation matrix, we have made no men- 
tion of the direction in which the ray travels, since this was again unnecessary. 
If the ray on the right of the surface is known, the ray on the left may be 
found by inverting Eq. (1.12): 


|R) = Rz; |Rə). 


Here R35,’ = Ri» is found to be 


f 1 0 
Rae = I — DK a 


Thus, the same refraction matrix is used regardless of which way the ray is 
traced, so long as the proper relative index of refraction is used. 


c) The Reflection Matrix 


In Fig. 1.5 a ray is incident from the left on a spherical reflecting surface of 
curvature K. The same sign convention is used here as for refracting surfaces. 
The law of reflection governs the relation between |R;) and |R;). 

Since the radius of curvature is normal to the surface, 


lé: = Ies]. (1.14) 
Since both 0, and y are shown as positive angles, 
ll = v t6. (1.15) 


6 MATRIX OPTICS 1.2 


Fig. 1.5. Reflection of a ray at a spherical surface. 


However, 9, is drawn as a negative angle, so we have 


lal = [02| — v 


(1.16) 
= —0, — wy. 


Again taking y = y; = ys, neglecting 6, and setting y = yK, Eqs. (1.14), 
(1.15), and (1.16) yield 


Jacys 0a = —2Ky, — 04. (1.17) 


The rays |) and |R;) are thus related by the reflection or mirror matrix 
Mg, through |R;) = Mo, | Rj), with 


Ma = (a oh (1.18) 

In the derivation of Eq. (1.18) both the incident and reflected rays were 
taken on the left-hand side of the surface. The solid rays on the left and 
their dashed extensions on the right represent the same rays, having the same 
heights above the axis at the surface and the same inclinations. Equation 
(1.18) would have been obtained if we had assumed the light to be incident 
from the right along the dashed rays. Therefore, the same matrix relates 
incident and reflected rays, for both convex and concave mirrors, regardless 
of on which side of the mirror the rays intersect. 

The optical systems which we will study will consist of various refracting 
and reflecting surfaces separated by different thicknesses of homogeneous 
materials of different indices of refraction. Any such system has a matrix 
which can be written as a product of the elementary matrices we have 
developed. In order to show how these elementary matrices are used to 
obtain a matrix describing a system, we will now derive a matrix for a “thick 
lens" system. The significance of the matrix elements for this system will 
become clear in Section 1.4. 


1.2 THE MATRIX OPERATORS 7 


Fig. 1.6. A single thick lens. 


Example 1. In Fig. 1.6 three media with indices 7, n2, and ng are separated 
by two refracting surfaces of radii rı and rz. The surfaces are a distance ¢ 
apart. The matrix V 4; relates a ray passing over point 1 just before the first 
surface to the emergent ray which passes over point 4 just after it leaves the 
last surface. This matrix is given as the product of two refracting matrices 
and one translation matrix, 


Va = RasTa2Ra1. 


Note that as we proceed through an optical system from left to right we must 
multiply the appropriate matrices from right to left. The elementary matrices 
are given by 


1 1 0 al : 
R», = = 
n Lacu I Taz lo 1P 


T 
" (a3 — 1)KS Mes)" 


The reader may verify that the product yields 


RA fesa a 
at d .p 
V21 V22 


where 
v = 1 nus — DK, 
Vig = Mgal, 


Ugo = Mis + tfKo(ma — m3), 


Ugi = (m3 — rtg) Ky + (nəs — Ke + (Pra = noA a K, Kst. 
afia 

It should be noted that if the system is as shown in Fig. 1.6, K is a positive 
quantity and K; is negative according to the sign convention adopted for 
convex and concave surfaces. 

An extremely useful relation may be obtained by noting that the deter- 
minants of the T-, R-, and M-matrices are, respectively, 1, 7; and —1. 
The determinant of a product of matrices is the product of the individual 


8 MATRIX OPTICS 1.3 


determinants. It is left to the reader to show that this implies that the 
determinant of the matrix for an optical system represented by the matrix S is 


Determinant (S) = (Ziitiai/Aiina))( — 1)", (1.19) 
where N is the number of reflecting surfaces in the system. Since 
Det (S) = 51,525 — 515521. 


given any three elements of S, the fourth may be found if the initial and final 
media are known and if the number of reflecting surfaces is also known. 
We will often write Det (S) = D, or |S]. 


1.3 THE S- AND V-MATRICES 


A matrix associated with an optical system is also associated with two points 
on the optic axis and with two planes perpendicular to the axis which pass 
through these points, e.g., points | and 2 in Fig. 1.1. Such a matrix may be 
termed a system matrix S. These reference points and planes are not unique. 
Much as the origin of a coordinate system may be placed wherever con- 
venient, so the reference points of an optical system may be taken wherever 
we wish. Moving the reference points will, of course, modify the appearance 
of the system matrix, but cannot alter the behavior of the actual physical 
optical system. 

If the state of a ray is known as it passes through the first reference plane, 
the matrix of the system allows us to find the state of the ray as it passes 
through the second plane. It is often useful to pick out as initial and final 
planes those which touch the initial and final vertices of the optical system. 
The matrix between these planes will be called the vertex matrix and is given 
the symbol V. Any matrix associated with the system may be termed a 
system matrix S, with the V-matrix being a special case of S. 

If for a given system an S-matrix is known between two points 1 and 2, 
then it is possible to find the system matrix S’, with elements s;; between 
any other two points 1’ and 2’, as indicated in Fig. 1.7. 

The quantities z and z' are taken as positive if measured from left to 
right. The system matrix connecting 1’ and 2’ is then given by 


"P _ (i :) ‘Six Sig z) 
Se TeS Tym | 1j es S. 1? 


Fig. 1.7. Translation of reference points for an optical system. 


1.4 SIGNIFICANCE OF THE INDIVIDUAL ELEMENTS Si; 9 


where the points 1’ and 1 are connected through T. and the points 2 and 2' 
through T+. The result of the matrix multiplication is 


s [^ + 2'S91 ZS + Z/$29 + ZZ/391 + sa (1.20) 
$21 Soo + 2821 hig 


The matrix given in Eq. (1.20) is of basic importance to what follows. It 
can, for example, permit an evaluation of image-object relationships. The 
reader should be sure that he understands its derivation and the meaning 
of each symbol in this equation; in particular the meaning of positive and 
negative values for z and z’. 


1.4 SIGNIFICANCE OF THE INDIVIDUAL ELEMENTS s;; 


The four elements, s; of a system matrix contain the total information 
which describes that system in the paraxial approximation. This information 
is shared among the elements. From Eq. (1.19) the determinant of a system 
matrix cannot be zero. This implies that at most two elements of the matrix 
can be zero, and if two are zero, they must be the two diagonal elements or 
the two off-diagonal elements. The implications of vanishing matrix elements 
will now be investigated. 


a) 5-0 

If s4, is zero, Eqs. (1.1) imply 
Pa = tiau (1.21) 
0, = S211 + S2201. 


Thus, if s,, is zero, the height of a ray at 2 is independent of its original 
height at 1. All rays which pass through the initial plane with the same 
inclination, 0;, pass through a common point in the final plane. This is 
illustrated in Fig. 1.8. 

That plane in which all parallel entering rays pass through a common 
point is called the second focal plane of the system, FP;. The point of inter- 
section of the second focal plane and the optic axis is called the second focal 
point, Fp2. 


FP, 


Fig. 1.8. Formation of an image in the second focal plane of an optical system for a 
set of parallel entering rays. 


10 MATRIX OPTICS 1.4 


In general, s,, will not be zero for a system matrix and the point 2 will 
not be the second focal point, Fp;. If sı, is not zero, let us assume that Fp, 
is some distance z' to the right of point 2. Then, point 2’ must be Fp. and 
in Eq. (1.20) si, must be zero. This yields 


Z = — s1521- (1.22) 


Thus, given a system matrix S between any two points 1 and 2, the second 
focal point, Fp;, is a distance —s11/S21 to the right of point 2. If —s11/S21 
happens to be a negative quantity, the second focal point is to the left of 
point 2. 


b) S22 = 0 
When s33 is zero, we have from Eqs. (1.1) 


Y2 = Syd 51305, 


0; = SaiVi- 


(1.23) 


All rays which pass through the initial plane at a height y, above the axis 
emerge from the system at an inclination 05 which is independent of their 
inclination 0,. This is illustrated in Fig. 1.9. 

That plane to the left of the system such that the emergent inclination is 
independent of the entering inclination, depending only on the entrance 
height, is called the first focal plane of the system, FP,. The point where this 
plane intersects the optic axis is called the first focal point, Fp,. Alternatively, 
it is the plane for which parallel rays entering the system from the right pass 
through a common point when emerging on the left. 

As before, the system matrix will usually not have sə» = 0. In this case 
we assume that Fp, is some distance z to the left of point 1. Then, demanding 
that 53. = 0 in Eq. (1.20), we obtain 


Z = —So9/So1- (1.24) 


Therefore, if the elements of the system matrix S, between points | and 2, 
are known, Fp, is a distance — 555/55; to the left of point 1. As before, if 
— 595/551 is negative, Fp, is to the right of point 1. 


I 8, 


Fig. 1.9. Rays diverging from a point in the first focal plane of an optical system 
leave as a parallel system of rays. 


1.4 SIGNIFICANCE OF THE INDIVIDUAL ELEMENTS 5i; 11 


Fig. 1.10. Behavior of a system of parallel rays in a telescopic system. 


€) So, = 0 


In the first two cases, if s11 or S22 was not zero, it was possible to translate 
to a system in which s;, or 532 was zero. This is not possible for s21, since 
in Eq. (1.20) it is seen that s5, = 55; and translation of the reference points 
has no effect on this element. This is the only element which translation does 
not affect, and 55, must therefore be determined only by the system and not 
by the location of the reference points 1 or 2. 

The special significance of sọ, will be examined later, but at this point 
we consider the very special case s;, = 0. From Eq. (1.1) we then have 


Y2 = yaSic 8353s, 


^ (1.25) 


03525, 


and the emergent angle of any ray depends only on the entrance inclination. 
In Fig. 1.10 illustrating this case the points 1 and 2 are not shown; for if 
33; is zero between one pair of reference points, it is zero between any pair. 
A system for which ss, is zero is called a telescopic system. The angular 
magnification of such a system is defined to be 05/0, and is given by 


Angular magnification — 555. (1.26) 


It is important to note that although angular magnification may be defined 
for any ray in any system, it may be different for different rays in a given 
system. Only for a telescopic system is the phrase "angular magnification 
of the system" meaningful. 


d) 5,—0 
If 5, is zero, Eqs. (1.1) yield 


Vo = Suy» (1.27) 
6. = $3191 + S2201, 


12 MATRIX OPTICS 1.4 


yy 


NS 


Fig. 1.11. Conjugate planes of an optical system. 


which, as illustrated in Fig. 1.11, implies that the height of a ray passing 
through the second plane depends only on the height of the ray when it left 
the first plane and is independent of 6;. All rays leaving a common point in 
the first plane converge at a common point in the second plane. Such points 
and planes are said to be conjugate points and planes, respectively. If one 
of the points is a source of light, its image is simply its conjugate point and 
the corresponding planes are frequently called object and image planes. 
From Eqs. (1.20), setting s; = 0, it is seen that, given any system matrix 
between points 1 and 2, pairs of conjugate points exist a distance z to the 
left of 1 and z' to the right of 2 related by 


vis 3k $22 + ut ES $5. (1.28) 


Zz Z ZZ 


When two planes associated with an optical system are conjugate, one may 
define the /ateral magnification between these planes. Referring to Fig. 1.11, 
the definition is given as 


Lateral magnification = ys/Y; = 51. (1.29) 


A good illustration of how the foregoing results apply is given below for a 
thin lens. 


Example 2. By a thin lens we understand the optical system of Example 1 
with the distance ¢ small enough to be ignored. We further assume that 
n, = ng = l and set n, =n. The vertex matrix V4; in Example 1 then 
becomes 


Va = ma = dn — Kə) 1) 


and from Eqs. (1.22) and (1.24) the first and second focal points are located 
distances f to the right and left of the lens given by 


= (n — 1)(Ki — Kə). (1.30) 


Sp 


1.5 THE PRINCIPAL PLANES AND LENS EQUATIONS 13 


Fig. 1.12. The principal planes and points of an optical system. 


The relation between object and image distances measured from the lens is 
given by Eq. (1.28) as 


1 1 
eE (1.31) 
which is the well-known thin lens equation. 


1.5 THE PRINCIPAL PLANES AND LENS EQUATIONS 


Any ray which enters an optical system from the left, parallel to the optic 
axis, passes through the second focal point. Any ray which passes through 
the first focal point leaves the system traveling parallel to the axis. Typical 
rays of each kind are shown in Fig. 1.12. These rays may be refracted and 
reflected many times in passing through the system and we do not know 
the exact path which they follow. In Fig. 1.12 the rays have been extended 
into the system as dotted lines until the entering ray intersects the emergent 
ray. The entire effect of the system on each ray can in this way be represented 
by one deviation, located at the planes PP, and PP; which intersect the axis 
at the points Pp, and Ppa. These planes and points are called, respectively, 
the first and second principal planes and points of the system. 

From Fig. 1.12 it is seen that if the locations of the principal planes and 
focal points are known, the image location for any object may be obtained 
by a geometric construction. The order from left to right of object, image, 
focal points, and principal planes need not be as given in Fig. 1.12. For 
example, in Fig. 1.13 the object is placed inside the first focal point and the 
order of the principal planes is reversed. Ray A, which enters parallel to the 
axis, must always be extended until it meets the second principal plane and 


Fig. 1.13. Ray tracing using the principal planes. 


14 MATRIX OPTICS 1.5 


then bent to pass through Fpə. Ray B, although it does not actually pass 
through Fp,, behaves as if it had and, hence, must be extended until it 
strikes PP, and then bent to leave the system parallel to the axis. Rays A 
and B never do pass through a real image point. Extended backward, 
however, they intersect at point J. Such an image is called a virtual image. 
Although the rays do not pass through J, upon leaving the system they 
behave as if they were diverging from 7. 

The examples given are for a system which must possess an even number 
of reflecting surfaces. The rays enter from the left and leave on the right. 
Ray tracing for a mirror system, or a system with an odd number of reflecting 
surfaces, proceeds in exactly the same manner. However, when the final 
rays are found, they must be extended backward to emerge from the left of 
the system. 

If we assume that none of the four points Fp,, Fps, Pp,, and Pp; are 
coincident, and since their locations can in principle be independently varied, 
there are 24 different orders in which they may occur along the axis. If the 
different locations of the object are then considered, we find in general 120 
different cases in which an image location may be determined by the ray 
tracing techniques illustrated in Figs. 1.12 and 1.13. The remaining 118 
cases are left for the problems! 

We now locate the principal points of an optical system in terms of the 
matrix elements which determine that system. Consider an object which is 
very close to the first principal plane. Its image is found by ray tracing in 
Fig. 1.14. It is seen that the image is very close to the second principal plane 
and that in the limit as the object approaches PP, the image approaches PP; 
and y, approaches yı. Therefore, when an object is located in PP,, the 
image is located in PP; and the lateral magnification is unity. The principal 
planes are often called the planes of unit lateral magnification. 

Assume that the system matrix, S, is given between two points 1 and 2, 
as in Fig. 1.7. Furthermore assume that PP, is located at 1’ and PP, at 2’. 
Since the principal planes are conjugate, sí, = 0 in Eq. (1.20), and since they 
are planes of unit magnification, sí, = 1. These yield 


511 + Ssa = 1, (1.32) 
2811 + Z/S99 + ZZ'S21 + Sig = 0. (1.33) 


Fig. 1.14. Geometric construction leading to the definition of the principal planes 
as planes of unit lateral magnification. 


1.5 THE PRINCIPAL PLANES AND LENS EQUATIONS 15 


Eq. (1.32) implies that, given any system matrix between points 1 and 2, the 
second principal point is located a distance to the right of 2 given by 
jus 
ge —— a Pe (1.34) 
$21 
If Eq. (1.34) is placed in Eq. (1.33), we find the first principal point located 
a distance to the left of 1 given by 


Det (S) — 55; 
Z = ——————— 


521 


= Pp, (1.35) 


where Det (S) = 511 S22 —S12 521. 

Given an arbitrary system matrix, the locations of the focal points of the 
system are given by Eqs. (1.22) and (1.24). Combining these equations with 
Eqs. (1.34) and (1.35), we learn that Pp, is a distance f, to the right of Fp, 
given by 


fı = —Det (S)/521, (1.36) 
and the distance right from Pp; to Fpa, called fo, is 
Ja —lsa. (1.37) 


The quantities f; and f; are known, respectively, as the first and second focal 
lengths of the system. It is easily seen that for any system 


Silfe = Det (S). (1.38) 


From Eq. (1.19) it is seen that the magnitude of the focal lengths will be 
equal only if the initial and final indices of refraction are equal. 

It is of great practical importance to note that the second focal lengths 
of any optical system, whose system matrix S relates any two points 1 and 2, 
is always given by the negative reciprocal of the element sı. This is the 
special significance of this element which was mentioned earlier. We have 
seen how to trace two very special rays through a system, one which enters 
parallel to the axis and one which leaves parallel. We now determine how 
to trace an arbitrary ray. 

The matrix which connects the principal points of a system will be 
called the P-matrix of the system. Since the principal planes are conjugate 
planes of unit magnification, the element pı» must be zero and we may 
immediately write, using Eq. (1.37), 


P= ET MI (1.39) 


If we operate with P on an arbitrary ray striking PP}, we see that it must 
leave PP, at the same height above the axis as it had at PP,. Only the angle 
at which it leaves PP, must be determined. This angle is found by imagining 


16 MATRIX OPTICS 1.5 


Fig. 1.15. Ray tracing for an arbitrary ray. 


that the arbitrary ray is one of a set of parallel rays striking PP, which must 
pass through a common point in the second focal plane. One of these rays 
must pass through the first focal point and, hence, leave the second principal 
plane parallel to the axis. Where this ray intersects the plane FP, so does 
the arbitrary ray. This is illustrated in Fig. 1.15. 

The matrix connecting the principal points is given by Eq. (1.39). Using 
this matrix, we may find the matrix connecting a point O, a distance z to 
the left of Pp;, and a point J, a distance z' to the right of Pp;, and thereby 
derive the Gaussian form of the equation relating image and object distances. 

The matrix connecting O and T is 

1-—2z'/f z+7D-Ž 
M =T,PT, = 1 fa |. (1.40) 
= n D-— z|fa 


2 


If we demand that O and T be conjugate, we stipulate 


z+7D-%=0. (1.41) 


2 
Dividing by zz’ and remembering that D = f,/f., we obtain 
hh 


Zz Zz 


=1. (1.42) 


If the final and initial media have the same index of refraction and the 
system contains an even number of reflecting surfaces, D — 1 and we 
obtain the Gaussian form of the lens equation, 


eae 1.43 
aes (1.43) 


where f, = fo = f. 


1.6 THE NODAL POINTS 17 
Pp, Fp, 
- .— - — 
ato f h 


Fig. 1.16. Geometry for the Newtonian form of the lens equation. 


1 
-e oeo ‘o 


pg 


If the system contains an odd number of reflecting surfaces, the Gaussian 
form of the mirror equation is obtained, 


c =, (1.44) 
z f 

where f; = —f = f. Known as the “thin mirror" equation, this is the 
equation obtained in elementary treatments of a single reflecting surface. 
By contrast, a thick mirror system would be any system containing more 
than a single reflecting surface. 

Thus, any Gaussian system obeys either the thin lens equation or the 
thin mirror equation if object and image distances are measured from the 
principal points. 

An alternative form of Eq. (1.43), first obtained by Newton, is found if 
distances are measured not from the principal planes but from the focal 
points. In Fig. (1.16) let x = z — fj, x’ =z’ — fa. Substituting into Eq. 
(1.43) leads to 


1 
z 


xx fif; (1.45) 
or, if D = +1, 
xx = +f’. (1.46) 


1.6 THE NODAL POINTS 


The matrix of an optical system connecting two reference points contains 
only four independent elements, and only three are independent if the initial 
and final indices are specified. If the locations of the principal points and 
focal points are given, the matrix elemenis may be calculated. Experiment- 
ally. the focal points may be determined by locating the image of a distant 
object, which, from Eq. (1.42), must lie in the focal plane. The experimental 
location of the principal planes is not always so easy. A third set of points, 
the nodal points, which may be readily determined by experiment, is therefore 
defined. It will be seen that the nodal points, Np; and Np;, and the principal 
points coincide when the determinant of the system is unity. 

We have seen that any ray striking PP,, at height y above the axis, must 
emerge from PP; at the same height. In particular, the ray which strikes 
Pp, must travel along the optic axis and emerge at Pp;. Using the P-matrix, 
we see that the emergent angle of this ray is equal to the incident angle 


18 MATRIX OPTICS 1.6 


Fig. 1.17. Location of the nodal points of an optical system. 


only if D = 1. This follows, since if y = 0, Eq. (1.39) implies 0; = DO. 
This is illustrated in Fig. 1.17, where 0; + 6,. Thus, the principal ray (the 
ray which strikes the axis at Pp, and leaves the axis at Pp;) has unequal 
initial and final inclinations. The nodal ray is defined as that ray for which 
the initial and final inclinations are equal. As in Fig. 1.17, the intersections 
with the optic axis of the entrant and emergent nodal rays are called the 
first and second nodal points of the system, Np; and Nps. The planes 
perpendicular to the axis which pass through the nodal points are called the 
first and second nodal planes, NP, and NP. The existence of the nodal 
points is verified by determining their location in terms of the matrix elements 
which determine the system. 

Assuming that the nodal points exist, the matrix which connects them 
can be found. The matrix connecting the nodal points will be given the 
symbol N, with elements n,;. It is seen that 71 must be zero, since y is zero 
if y, is zero. Since 6, must equal 0, when y, = yə = 0, nz must be unity. 
These conditions imply 

N- ( p re (1.47) 


—lff, 1 
Equation (1.47) allows us to determine the locations of the nodal points for 
an arbitrary system. Assume that the system matrix S is given between two 
points 1 and 2. Let Np; be located at point 1’ a distance z to the left of 1 
and Np; be located at 2’ a distance z' to the right of 2. Then, if the form 
of Eq. (1.47) is imposed on Eq. (1.20), we obtain 


D- 
zl Npa (1.48) 
$21 
PT 
z= — ML Np (1.49) 
521 


By comparing Eqs. (1.34), (1.35) and Eqs. (1.48), (1.49), it is seen that the 
nodal points coincide with the principal points if and only if D = +1. 

It remains to be seen how the nodal points may be determined experi- 
mentally. In Fig. 1.18 a set of parallel rays enters the system and is brought 


1.7 SUMMARY 19 


/ 


Fig. 1.18. System with nodal points on the optic axis. 


to a focus at Fpə. The axial ray, A, passes through both nodal points at 
zero inclination and reaches the image point on the axis. In Fig. 1.19 the 
entire system has been rotated through a small angle $ about the second 
nodal point. It is now ray B which enters the system headed toward Np, 
and which leaves the system parallel to its entering direction. The entering 
rays are still parallel and must be brought to a focus at the point where ray 
B crosses FP. Thus, by rotating the system about an axis through Np; the 
lateral position of the image of a parallel set of rays does not change. Thus, 
by examining the image of a distant object, while rotating the system about 
axes perpendicular to the optic axis, one may locate that axis which passes 
through Nps. It should be obvious how one can locate Np, in a similar 
manner. The six points Fp, and Fps, Pp, and Pps, and Np; and Np; are 
called the cardinal points of an optical system. Any four noncoincident 
cardinal points completely determine the system matrix and the Gaussian 
behavior of the optical system to which they belong. 


1.7 SUMMARY 


In the matrix formulation of an optical system in the Gaussian approxima- 
tion, such a system is determined when the locations of the cardinal points 
are found. Given an optical system, one must be able to form the system 
matrix as a product of reflection, refraction, and translation matrices, and 
from the system matrix obtain the location of the cardinal points. 

The location of the cardinal points in terms of the matrix elements are 
summarized below: 


fo = —Mszis fı = Dfa, 


Fpı = —sos|s;i, Fpa = —S11/821, 
D — S22 1 — sy; 
Pp, = ——, Pp = > 
pa S21 Ps $21 
1 — $55 D — sy, 
Np, = , Np= " 
pi S21 ps $21 


The use of these equations may be demonstrated by the following example. 


20 MATRIX OPTICS 1.7 


Np; 


Fig. 1.19. System rotated about nodal point two. 


Example 3. As in Fig. 1.20, a glass hemisphere has its flat face silvered. 
Light impinges on the spherical side from the left. The index of refraction 
of the glass is 1.5 and it is surrounded by air. Locate the cardinal points of 
the system. The radius of the spherical surface is taken as the positive 
quantity R. 

The vertex matrix is given by 


Vir = Rio Tos Mas Tas Rai, 


which, written out in terms of n and R, gives 


1 oA /1 —RV fl. A/A R 1 0 
Vu 1 AE T 
i= m. xo dj ele (:-1)]g & 
R n ]R n 
2 2R 
n n 
TAAA d 2 
ib) v 
R\ n, n 


It is important that the reader understand the origin of each term, thus 
checking his understanding of the conventions being used. 

A useful check on the calculation is the determinant, which is found to 
be —I. 


Fig. 1.20. Hemispherical thick mirror with the plane face silvered. 


PROBLEMS 21 


Fig. 1.21. Cardinal points for a hemispherical thick lens. 


For n = 1.5, we have 
1 4R 
3 3 
V= 
2 1 
3R 3 


The cardinal points are thus found to be 
Fp, = +R/2, Fp, = —R/2, 
Ppp = —R, = Pp; = +R, 
Np, = +2R, Nps = —2R. 


Since the reference points of the system are both at the spherical vertex, the 
cardinal points are as shown in Fig. 1.21. 

In conclusion, it must be pointed out that we have not discussed the 
major practical problems of geometric optics, the aberrations of optical 
systems. For all but the most unsophisticated systems, designs based on the 
Gaussian approximation are insufficient for the production of quality images. 
In addition, for all real optical materials the index of refraction is not inde- 
pendent of wavelength and thus the locations of the cardinal points are 
functions of wavelength or color. Discussion of these difficulties belongs 
more properly to a textbook of optical design, and would take us too far 
afield to be included here. However, matrix methods exist to handle these 
problems also, and the interested reader is referred to the bibliography. 


PROBLEMS 


1.1 Locate the cardinal points of an optical system consisting of two thin lenses 
in air, separated by a distance t. If one thin lens has a positive focal length and 
the other a negative one, show that it is possible for the system to have a positive 
focal length with neither principal plane between the lenses. 


22 MATRIX OPTICS 


Figure P1.1 


Answer: 


y= l = tf. t | 
-HR — f+ fife Y df 


1.2 Locate the cardinal points of a sphere of radius R and index of refraction n, 
with air on both sides of the sphere. By considering the definition of the nodal 
points prove that they must be located at the center of the sphere without performing 
a calculation. 


Answer: f = nR/2(n — 1). 


1.3 Locate the cardinal points of a glass hemisphere whose spherical side is 
silvered. Light enters from the left through the plane side. 


Answer: Principal points, R/n to right of plane surface. Focal point, R/2z to 
right of plane surface. 


1.4 The object-image relation for a thin lens is given by 1/s + 1/s’ = 1/f, where 
1/f = (n — 1)(K, — K2). For such a thin lens a second, much dimmer, real image is 
often found to the right of the lens, whose location is given by 1/s + 1/s' = (3n — 1) 
(Kı — Kz). Explain the origin of this image and derive the equation giving its 
location. 

L5 The lens shown in Fig. P1.1 is surrounded by air. The radii of curvature are 
equal and the thickness of the lens is t. Prove that the principal planes are separated 
by the same distance f and that the focal length is positive. 


1.6 If the surfaces of the lens in Problem 1.5 are not of equal curvature but have a 
common center of curvature, prove that the principal planes coincide at the common 
center of curvature and that the focal length is negative. 


1.7 In Fig. P1.2 a ray is shown proceeding toward the first nodal point of a 


Figure P1.2 


PROBLEMS 23 


* 


"n 


«ue 


Figure P1.3 


single thick lens in air. The point on the axis through which this ray passes is 
known as the optical center of the lens. Prove that the location of this point is 
independent of the index of refraction of the lens. 


1.8 A doublet is constructed of two thin lenses separated by a distance t. The 
lenses are made of the same glass and have an index of refraction mo for light of 
wavelength 2o. Prove that the focal length of the doublet is the same for all wave- 
lengths near 2o if fi + f; = 2r. 

1.9 In Fig. P1.3 a triple combination of thin lenses is shown. The positions of the 
first and last lenses are fixed but the middle lens is allowed to move. Prove that the 


distance between the focal points of the system is constant if x + y = 0 and f; = fs. 
Under these conditions show that the focal length of the system is given by 
f= -RIE f$. 

1.10 The first focal point and the first principal point of an optical system are 
given as distances F, and P; to the left of some point A. The second focal point and 
second principal point are likewise distances F, and P; to the right of some point B. 
Show that the matrix S between these two points is given by 


Fo PPa — FP — FP, 
F — Pz F, — P, 

1 F, 
P, — Fo F,— P; 


Sea = 


1.11 Let the vertex matrix of an optical system be given as 
v= nl S) 
$21 S22) 
If the system is now reversed so that vertex a becomes vertex b and vice versa, 


the system will possess a new vertex matrix, V'. Under the condition that |V'| — 
|V| = 1, show that 


1.12 In Fig. P1.4 an arbitrary ray is shown for a system with an odd number of 
reflecting surfaces. If S is the system matrix connecting the coincident points 1 and 
2, show that s;1 = — $25. 


24 MATRIX OPTICS 


T 


Figure P1.4 


1.43 From the results of Problem 1.12 prove that the focal points of a system 
with an odd number of reflecting surfaces coincide, as do the principal points and the 
nodal points. Prove that the focal points are midway between the principal points 
and the nodal points. 


1.44 Illustrate the results of the last problem for the particular case of a thin lens 
of focal length +f placed a distance d to the left of a mirror of radius — R. 


1.15 Negative principal points are defined as conjugate points of unit negative 
lateral magnification. For a single lens in air show that the negative principal 
points are two focal lengths from the principal points. 


1.16 Let the matrix of a system be given between two reference points 1 and 2 
with 55; having D(S21) = +1. If points 1 and 2 are fixed but the system is moved a 
distance d to the right, show that the new matrix between 1 and 2 is 


j = ROS, afte ) 
Sí =R Sak, R lo i 
1.17 In Fig. P1.5 a refracting system is shown. Assume that the system matrix 
Sva is given with D(S) = 1. Point p is midway between points a and b and point 
p' is a distance d to the left of point p. Let the system now be rotated about the 
line OO’ so that p’ is fixed but the system is reversed. Show that the new matrix 
between a and b is the same as the old matrix if d = (si; — S22)/2511. 


(0) 
e- p i - 
a b 
fold 


Figure P1.5 


CHAPTER 2 


THE EXPERIMENTAL EVIDENCE 


Ordinarily, in introductory courses great ingenuity is exercised in making the 
concepts of physics as concrete as possible. Difficulty is frequently experi- 
enced by the student in later courses because formerly familiar concepts must 
be extended or radically revised when applied in a different domain. Such 
is usually the case with regard to the concepts of moment of inertia and 
angular momentum in the transition from particle dynamics to rigid body 
motion. The concept of angular momentum in quantum mechanics is 
significantly different from that in classical physics. 

It is important to realize that our study of light will be from a perspective 
very much different from that in a traditional course. The principal emphasis 
in such a course, implicitly at least, is the consideration of light as an electro- 
magnetic wave. This is possibly the easiest way, but often enough this results 
in over-dependence on the electromagnetic interpretation. 

Our initial preference is for a description of light in which the emphasis 
is on the observable states of light known as polarization states. The intention 
of this chapter is to describe experiments which give data essential for our 
perspective, to directly correlate experiments with polarization states, and 
to indicate how these states in turn may be correlated with a mathematical 
formalism. The sections dealing with the different kinds of refracting media 
are important to an understanding of the devices constructed to produce and 
analyze polarization states. 


2.1 THE QUANTITATIVE MEASUREMENT OF LIGHT AND QUANTA 


To begin our study, we must have some way of detecting light. There are 

many possible detectors which could be employed, and normally the exact 

type will not be specified. For our present purpose, let the detector be a 

large hollow metal sphere with a tiny hole. After careful elimination of all 

other possible energy inputs, it is found that if light is allowed to enter the 

sphere, the temperature of the sphere increases, and we conclude that light 
25 


26 THE EXPERIMENTAL EVIDENCE 2.1 


must be a carrier of energy. By measuring the temperature change of the 
sphere, the time rate of change of the energy per unit area of the beam 
entering the hole can be deduced. In principle, we have an intensity meter 
which can be calibrated to read out in W/m? and which can be used to cali- 
brate more sophisticated detectors. In order to make life easier, let us 
connect to an intensity meter an automatic recorder which will continuously 
plot the power input to the meter as a function of time. 

We begin a second experiment by taking a source of light, say an electric 
lamp powered by a constant voltage source, and enclose it in a box with a 
small hole through which light can emerge. A detector and the source of 
light are placed at a fixed distance from each other, and enclosed in a room 
with no other sources of light. If a number of filters are placed successively 
in the path of the light between the source and the detector, the automatic 
recorder will plot a graph like that shown in Fig. 2.1(a) (the filters could be 
simply dark pieces of smoked glass), where each jump down represents the 
addition of another filter. If the amplification of the recorder is increased 
and more filters are placed in the beam, it is found that the plot is not so 
smooth as before and begins to appear as in Fig. 2.1(b). Continuing the 
process of filter addition eventually results in a graph such as that in Fig. 
2.1(c). 

Since the plots are of intensity vs. time, the area under each peak in 
Fig. 2.1(c) represents an energy absorption by the intensity meter, and we are 
led to conclude that the energy which the light carries is not being absorbed 
in a continuous manner, but in discrete bursts. This is truly a remarkable 
and very important result. Our thought experiment is simply one instance 
of what is always observed: In every experiment performed to date in which 
the total absorption of light energy has been studied at sufficiently low intensities, 
this energy has always been observed in discrete amounts. Furthermore, it 
can be shown by other experiments that this absorption takes place in a 
highly localized region of space, e.g., at a single silver atom inside one silver 
grain in a photographic emulsion—and during a very small interval of time. 
Since the absorption is discrete and highly localized in space and time, we 
associate with light the concept of a particle. This highly localized discrete 
energy or quantum of energy is called a photon. 

The concept of a photon as a particle must, however, be handled with 
care. Later we shall see that the location of a photon usually has a completely 
definite meaning only after the fact. For example, we can say where a photon 
was absorbed in a photographic place; but once the energy is absorbed, the 
photon no longer exists. A word of caution, then, before we proceed. We 
have not said, and will not at any time say, that light is discrete. We will say 
only that its energy is absorbed in discrete amounts. 

Returning then to the quanta of energy associated with a light beam, it 
is soon discovered that these quanta are not all of the same energy. However, 


(a) JE 
dle ell. 
L 
(b) xs 
L 
(c) E 


Fig. 2.1. Transmissions of light through a series of opaque filters. 


28 THE EXPERIMENTAL EVIDENCE 22 


it is possible to extract from a beam of light a sub-beam all of whose quanta 
are almost of equal energy. There are many ways to do this, but a discussion 
of these methods is not relevant at present. We will refer to monoenergetic 
beams and sometimes to monochromatic beams, since energy and color prove 
to be related. 


2.2 POLARIZING FILTERS 


There are commercially available a number of devices which can generally 
be grouped under the heading “polarizing filters". The nature of these 
devices can be discussed in terms of some theoretical description of light. 
Since we have no such description at present, we will examine these devices 
by a completely empirical investigation of their effect on a beam of light. 
Assume an unlimited supply of three types of filter, which will be labeled 
types R, L, and P. Later they will be called Right Circular (R), Left Circular 
(L) and Plane (P), but the applicability of these terms is not as yet evident. 

Our first observation is that when only one of a given type of filter is 
placed in a beam of light, the intensity of the transmitted beam is one-half 
that of the incident beam. Moreover, it is found that the reduction of the 
intensity by one-half does not occur for all possible sources of light. For 
example, it does not occur for certain laser beams. In what follows in this 
and the next chapter, it will be assumed that the light source emits nearly 
monochromatic light and is such that each filter, upon encountering directly 
light from the source, transmits only one-half the intensity of the light 
incident on the filter. We will call such light unpolarized light. Many 
common light sources, though not monochromatic, produce unpolarized 
light. 

Secondly, it is found that if either an R- or an L-filter is in the beam and 
an additional filter of the same type is placed in the beam, no further reduc- 
tion of the beam intensity occurs. Rotation of the R- or L-types about the 
beam direction (which will be called the optic axis) does not affect the 
transmitted intensity, nor does rotation by 180° about an axis perpendicular 
to the optic axis have any effect. If an R-filter is already in the beam and it 
is followed by an L-filter, the transmission drops to zero. Interchanging the 
positions of the R- and L-filters still results in zero transmission. Rotation 
of either or both R and L about the optic axis has no effect on this result. 

Now, what about the P-filter? Well, if two P-filters are placed in the 
beam, it is found that on rotating only one of the filters about the optic axis the 
intensity of the transmitted light varies between zero and 100% of Imax. 
the intensity of the light which is transmitted by the filter which first intercepts 
the beam. Let us orient the two filters for maximum transmission and draw 
a line on the face of each filter such that the lines on the two filters are parallel. 
These lines have been chosen such that when they are parallel maximum 


2.3 STATES AND PREDICTION 29 


transmission is obtained. This process could be repeated for all the other 
P-filters at our disposal. Note that the direction of the lines of the P-filters 
has not been chosen in a unique way. This arbitrariness will be removed in 
Section 2.8. If we now measure the intensity transmitted as a function of the 
angle between the lines of any two P-filters, we find the intensity to vary as | 


I = Imax. COS? 0. (2.1) 


This experimental relationship is known as Malus’ law. Finally, if a P-filter 
is placed before or after an R- or L-filter, an additional decrease in intensity 
by one-half occurs and rotation of either filter has no effect. 


2.3 STATES AND PREDICTION 


The experimental results discussed in the preceding section and summarized 
in Fig. 2.2 indicate that the beams emerging from an R- or L-filter are 
different in some property which has not been discussed. That is, two light 
beams can both be monochromatic and still differ. Let us simply say that 


Fig. 2.2. Transmission through polarizing filters. 


30 THE EXPERIMENTAL EVIDENCE 2.4 


light which has traversed an L-filter is in a L-state; similarly, that which has 
traversed an R- or P-filter is in an R- or P-state, respectively. Light which 
has been prepared in an R-state will always traverse a following R-filter but 
never a following L-filter. Similar statements can be made concerning a 
prepared L-state. In other words, the outcome of an experiment involving 
light in either an R- or L-state incident on a series of R- and L-filters in any 
combination can be completely predicted. By complete prediction is meant 
that the outcome of an experiment can be stated with certainty (in terms of 
probability, the probability of an outcome is either zero or one). If complete 
prediction is possible, we will say that the state involved is pure. Note that 
the word “‘state’’ applies to the light but that the term “pure state" is defined 
in terms of the light and the type of experiment performed. 

In the example involving R- and L-filters complete prediction was 
possible, but is this always the case? Can a state be pure in relation to one 
experiment and not pure in relation to another? Can there be experiments 
in relation to which there are no pure states? 

To answer these questions let us prepare a monochromatic beam and 
place an R-filter followed by a P-filter in the path of the beam. We find that 
the light which has first traversed the R-filter comes through the P-filter with 
its intensity reduced by one-half. Consequently, light in an R-state is not 
pure in relation to the P-filter. Are there pure states relative to the RP-filter 
pair? No, because only light in an R-state is totally transmitted by the 
R-filter, and light in an R-state does not have 100% transmission through a 
P-filter. In terms of the photon concept, we can only give a 50/50 probability 
that the “next” photon which passes through the R-filter will either pass 
through or be absorbed in the P-filter. 


2.4 SUPERPOSITION OF STATES 


We now wish to describe an experiment which shows how two different states 
may be combined to produce yet another state. Consider the experimental 
arrangement shown in Fig. 2.3, which is known as a Mach-Zehnder inter- 
ferometer. M, and M, are half-silvered mirrors and M and M’ are full- 
silvered mirrors. A monochromatic incident beam striking M; is split and 
recombines at M. A and A’ are variable absorbers whose sole function is 
to control the intensity of the light in each beam. The paths M,M'M; and 
M,M™M, are chosen such that each beam within the apparatus travels the 
same distance. The filters placed in various beams are designated by the 
nomenclature already adopted. Two wedges of transparent glass form a 
variable retarder, F, whose thickness can be varied by simply moving the 
wedges relative to each other. The role of the retarder will become evident 
as the experiment is discussed. 

Assume for the moment that the thickness of the retarder F is zero, and 


24 SUPERPOSITION OF STATES 31 


SAIS 


Fig. 2.3. The Mach-Zehnder interferometer. 


bear in mind that the light striking M; is in a P-state since it has passed 
through a P-filter. If the absorbers are varied such that only the upper beam 
is transmitted by the apparatus, the outgoing light is observed to be in an 
R-state; if only the lower beam is transmitted, the outgoing light is in an 
L-state. Now suppose that each absorber is set to give complete transmission 
of each beam. An analysis of the emergent beam by P' shows that it is ina 
pure P-state whose line is the same as the incident P-state. This is true even 
in the case of low intensity, when individual photons can be counted. 

Our experiment indicates that a P-state can be represented in some way 
as a superposition, or an addirg together, of R- and L-states. The final 
P-state in our experiment has somehow been prepared by the entire 
apparatus. The individual R- and L-states observed when one path is closed 
are prepared by a different apparatus. The state obtained by the entire 
apparatus is not obtained by randomly mixing R- and L-states. We do not 
find that statistically half of the emerging quanta are R's and half L’s. They 
are all P's. 

Let us now examine the outcome of an experiment in which the thickness 
of the retarder F is varied. In Fig. 2.3 the polarizing and analyzing filters, 
P and P’, respectively, have their “lines” initially parallel. While the lower 
beam traverses the retarder, it travels at a lower speed given by v = c/n, 
where c is the velocity of light in vacuo and n is the index of refraction of F. 
If 7 is the time required for the lower beam to traverse a retarder of thickness 
t, then the optical path length is defined to be the distance in vacuum which 
the beam would traverse in the same time 7. It may be shown that the optical 


32 THE EXPERIMENTAL EVIDENCE 25 


path length is given by nt. Since the beam travels a distance f through F and 
would have traversed the distance nt in the absence of F, the beam is retarded 
by the distance /(n — 1), which will be called the retardation. As we con- 
tinuously increase the retarder thickness from zero, the emerging light 
remains in a P-state, but the line of P' must be rotated about the emergent 
beam axis in order to maintain maximum transmission. After a certain 
thickness d has been inserted, the line of P' has rotated through 180° and we 
have recovered a pure P-state in relation to filter P. For each increase d in 
F, the optical path of the lower beam increases over that of the upper beam 
by an amount (n — 1)d, and a pure P-state in relation to filter P is recovered. 
We conclude that light somehow repeats itself in space; it is characterized 
by a spatial periodicity. Thus, we can assign a length to our light which is 
as typical of the light as is its energy. If this experiment is performed with 
various energies of light and the characteristic length for each energy is 
measured, we find that the energy E = 5/4, where b is a constant and À is 
the characteristic length or wavelength. The wavelength is numerically equal 
to the increase in optical path length necessary in our experiment to cause 
the line of P' to complete one half-revolution of 180°. For visible light the 
common unit of wavelength is the angstrom (A), which is a length of 1078 cm, 
with visible light falling in the range 3000-8000 A. 

Since they are derived from the same source, there is a very definite 
relationship between the two beams in the apparatus. The emerging beam 
is always in a P-state when both arms of the apparatus are open. If one 
attempts to combine R- and L-states from different independent sources, the 
combined beam is not in general pure in relation to any of the filters and 
consists of a random mixture of R- and L-states. If a definite relationship 
exists, the resultant superposition is said to be coherent; otherwise, it is 
incoherent. The concept of coherence will be examined in more detail in 
later chapters. 


2.5 E-STATES 


At this point the student may be wondering whether we are finished with 
this business of describing experiments and giving their results. Not quite. 
The apparatus shown in Fig. 2.3 has possibilities which have not been 
explored. Recall that the separate beams combined to give a P-state when 
the absorbers 4 and A’ were set to allow total transmission of each beam. 
Can other states be produced? Of course. Set A’ to block the upper beam 
and the emerging light is in an L-state; block the lower beam to get an 
R-state. Now, for a given retarder thickness, let A’ give total transmission 
and vary A continuously to give transmission from 0 to 100%. We observe 
a continuous transition of the emerging light from an R-state to a P-state. 
On interchanging A and 4’ a similar transition is observed from an L-state 


2.5 E-STATES 33 


Fig. 2.4. The intensity transmitted through a P-filter for various incident states 
as a function of the orientation of the P-filter. The quantity $ is as used in Eq. 
(3.11). 


to a P-state. The states observed “in between" the R- and P-, and L- and P-, 
are called E-states. Since any E-state is a superposition of R- and L-states, 
the R-, L-, and P-states are simply particular E-states. 

If the light from the interferometer is analyzed with the filter P' in Fig. 
2.3, the intensity of the light transmitted by P' generally varies with the angle 
which the line of P' makes with the filter P. In Fig. 2.4 is shown a polar plot 
of the intensity of the light transmitted by P' vs. the angle of rotation of P' 
for different emergent states from the interferometer. Two states are labeled 
E. Rand P designate R- and P-states, respectively. The values of ¢ labeling 
the states will be explained in Chapter 3. 

We have seen that the R-, L-, and P-states are pure in relation to the type 
of filter which produced these states, i.e., light in each of these states is trans- 
mitted without loss through a filter identical to that which produced the 
initial state. Furthermore, from the set of R-, L-, and P-filters we can extract 


34 THE EXPERIMENTAL EVIDENCE 2.6 


two pairs of filters which possess a common property: the states of light 
produced by each member of a given pair are mutually exclusive in the sense 
that the state transmitted by one filter is not transmitted by its partner. 
Such mutually exclusive pure states are said to be orthogonal. Two P-filters, 
for example, whose lines are oriented at 90° relative to each other produce 
orthogonal states. 

To each £-state produced by our interferometer a filter exists in relation 
to which it is pure. If this is so, do E-states come in orthogonal pairs? The 
answer is, "yes". To create an E-state orthogonal to another, the absorber 
in one arm of the interferometer is interchanged with that in the other arm 
and an additional retardation of one half-wavelength must be introduced 
in either arm. 

One conclusion to be drawn from the interferometer experiments is that 
any polarization state can be obtained by the coherent superposition of an 
R-state and an L-state. Tt is pertinent to emphasize that this is not the only 
possibility. The assertion has been made that for each E-state there exists 
a corresponding filter. Now assume that we have two filters E, and E; 
which produce orthogonal states. What is observed if the R- and L-filters 
are replaced by the E,- and E,-filters? It is found again that any polarization 
state can be obtained by proper adjustment of the retarders and absorbers. 
Thus, the choice of representation of an arbitrary E-state as a superposition 
of R- and L-states is not unique; any pair of orthogonal states could be used. 

A crucial experimental result is now given. As above, let E, and E, be 
orthogonal pure states. For any given beam of intensity J, which strikes an 
E,-filter, let the transmitted intensity be /,. If the same beam strikes an 
E,-filter, let the transmitted intensity be Jp. Experimentally it is found that 


Ip = 1, + ty. 


Since the probabilities, p, and ps, that the given state will pass an £,- or an 
E,-filter are given by 


A= LI. pa— I| fo, 
it follows that 


Pit p.d. (2.2) 


2.6 KETS 


In classical mechanics little progress can be made until vector calculus is 
developed and a consistent notation is used to represent vectors. The 
problem can be solved simply be printing a symbol in bold type to indicate 
a vector. We will indicate that our symbols R, L, etc., represent states by 


2.6 KETS 35 


using a notation due to Dirac, | R^, |L>, |E>, etc. Just as we say that F isa 
vector, we will say that |E> is a ket. Now, it appears natural to express the 
result of one of our experiments in Section 2.4 with an equation such as 
|P» = |R» + |L». This simply says that a P-state can be obtained by the 
superposition of an R- and an L-state. But we know that a simple super- 
position of an R- and an L-state will not necessarily give a P-state. In fact, 
a continuous succession of states from R- to P- to L- can be produced by 
varying the absorber in one arm of the apparatus (Fig. 2.3) from 0 to 100% 
transmission, while the absorber in the other arm is varied from 100 to 0%. 
Since we think of the absorbers as determining how much of each beam 
contributes to the superposition, our "addition" can be extended to 
|E> = a|R> + b|L>, where the constants a and b give the proportional 
amount of |R» and |L> contributed to the final combination. 

What does our experimental evidence tell us about these constants ? 

In the interferometer experiments the retarder F also plays a role in 
determining the state of the emergent light. In Fig. 2.3 the L-state is modified 
in the superposition by the presence of the retarder, and this modification is 
to be included in the multiplicative constant b. The retarder changes the 
state L to the state L’ with |L'» = b|L». If t is the thickness of the retarder 
and n its index of refraction, then the retardation (s) is given by t(n — 1). 
Since it is the retarder that affects the state |L>, b is a function of the retarda- 
tion and we write |L'» = b(s)|L>. It will be assumed that b(s) is a continuous 
function of s. If an additional retarder of thickness /' is inserted, the state 
L' is changed to the state L” such that |L"» = b(s)|L'». Now, the same state 
L” must result if the total thickness (t + r’) is inserted at once. Consequently, 
we must have |L" = b(s + s')|L», and b(s')b(s) = b(s + 5). The only 
continuous function satisfying this condition is the exponential function. 
Thus, b(s) = Be”, where B and « are constants. Recall that with a suitable 
continuous increase of retarder thickness the emergent state rotates through 
a series of states until it returns to its initial state. This demands that the 
exponential function be periodic, and this can be so only if the exponent isa 
pure imaginary. Therefore, we write b(s) = Be"^, where B and k are real 
numbers. 

From the above argument we can see that if retarders are in both arms 
of the interferometer in Fig. 2.3, then the superposition can be written as 


|E> = al|R> + b|L», (2.3) 


where a and b may both be complex. 

This equation bears an extraordinary resemblance to the vector addition 
of two orthogonal vectors in two-dimensional vector space. It suggests that 
we can think of a polarization state as a "vector sum" of two mutually 
exclusive states. This analogy will be developed at some length in the follow- 
ing chapter. 


36 THE EXPERIMENTAL EVIDENCE 2.7 


2.7 THE LINEAR AND ANGULAR MOMENTA OF LIGHT 


There are very few basic quantities discussed in physics. In other words, 
there are few quantities which obey a conservation law. Other than energy, 
the remaining quantities which we will consider are linear and angular 
momenta. Let us imagine a torsional pendulum suspended from a very fine 
quartz fiber lying along the axis, as in Fig. 2.5, and enclosed in a vacuum 
chamber to reduce frictional effects. The pendulum consists of a horizontal 
cross-arm whose equilibrium position is along the y-axis, connected to the 
vertical quartz fiber. The oscillating mass can be a black surface at one end 
of the cross-arm with a similar mass at the opposite end to act as a counter- 
weight. The plane of the black surface is parallel to the quartz fiber. We 
now shine an off-axis beam of light, traveling parallel to the x-axis on to 
our black surface, in short bursts whose frequency coincides with the natural 
frequency of the pendulum, and a low amplitude oscillation is observed. As 
the pendulum swings through its equilibrium position, it possesses a certain 
amount of linear momentum. By an appeal to conservation of linear 
momentum, we can only deduce that the light which has been absorbed in 
the black surface carried with it a certain amount of linear momentum. If 
we have arranged for our beam of light to be monoenergetic, or nearly so, 
we may compute the number of energy quanta which have been absorbed 
by the pendulum, knowing the intensity of the beam. By measuring the 
amplitude of the oscillations we may compute the linear momentum which 
has also been absorbed. If we divide the number of energy quanta into the 
linear momentum, we find that, regardless of the particular energy used, the 
linear momentum associated with the discrete energy bundles is given by 


|p| = Ele = hii, (2.4) 


where A is Planck’s constant, which has the experimentally determined value 
h = 6.63 x 107?* J sec, and c is the velocity of light in vacuo. 


light pulse 


lu 


Fig. 2.5. Torsion pendulum for detecting the linear momentum associated with a 
light beam. 


2.7 THE LINEAR AND ANGULAR MOMENTA OF LIGHT 37 


Not only does our pendulum possess linear momentum, but through its 
rotation it must also possess an angular momentum. This rotational momen- 
tum, however, arises from the particular moment arm of our black surface. 
If the beam and surface had been perfectly centered on the fiber instead of 
the cross-arm, no rotation would have taken place. In a given system angular 
momentum is computed from r x p, where r is the vector from the origin 
about which the angular momentum is computed; and in a particular 
system a portion, if not all, of this can be transformed away by an appropriate 
choice of origin. This type of angular momentum is often called orbital 
angular momentum. As an example, consider the purely classical situation 
in Fig. 2.6. A ball of mass M travels in the xy-plane with velocity v parallel 
to the x-axis. It is also spinning about an axis through its center which is 
parallel to its velocity. 

The total angular momentum can be considered as being made up of 
two parts. The orbital part L whose magnitude is given by 


|L| = |r||v| M sin 0 = |v|Md 


can be transformed away by shifting the origin to the point P on the y-axis. 
The other angular momentum, S, is that associated with the spinning of the 
ball about its center. This cannot be altered by shifting the origin. 

The distinction between orbital and spin angular momentum is not 
unambiguous in classical mechanics. For example, the spin momentum of 
the ball may be thought of as being the sum of the orbital momenta of each 
particle of the ball about its center. In Chapter 9, however, we shall see that 
a clearer mathematical distinction may be made from the viewpoint of 
quantum mechanics. 

The existence of orbital angular momentum in our experiment was 
unavoidable since light has linear momentum. We now ask whether light 
possesses angular momentum which is independent of a translational origin. 


Fig. 2.6. The orbital angular momentum, L, and the spin angular momentum, 
S, of a classical particle. 


38 THE EXPERIMENTAL EVIDENCE 2:1 


Fig. 2.7. Polarized light incident on an absorbing disk which is free to rotate about 
the axis of the beam. 


If for the moment we adopt the dangerous viewpoint of regarding the photon 
as a particle, we ask whether this particle can be "spinning". We construct 
another experiment, shown in Fig. 2.7, in which the plane of our absorber 
is perpendicular to the axis of our support and we shine our beam along the 
axis of support. If we use our original raw beam as it comes from the source, 
we will find no detectable induced rotation of the disk. Even if there were a 
nonorbital or intrinsic angular momentum associated with the many photons 
striking the disk, this result is what we would expect. The odds are that the 
individual photons would have their spin angular momentum vectors 
oriented randomly to the axis, and, hence, have no net angular momentum 
along the direction of the axis or along the direction of their velocity, which 
is the same thing. 

At our disposal, however, are our polarizing filters, which have the 
ability to reduce randomness in a beam and give us pure states. We thus 
repeat our experiment for the detection of intrinsic angular momentum 
using pure states of one type or another. Using P-states, we find no rotation. 
If we allow a beam which has been placed in an R-state to strike our appar- 
atus, we find that the disk does indeed begin to rotate and in such a direction 
that its angular momentum vector points along the velocity of the incident 
beam. If we use L-light, we find a rotation in the opposite sense with the 
angular velocity vector pointing antiparallel to the velocity of the incident 
beam. We can measure the total angular momentum of the disk after a 
given amount of energy is absorbed. If we determine the number of quanta 
which have been absorbed and divide this into the total angular momenta, we 
find that with each quantum of energy absorbed an amount of angular 
momentum is absorbed equal to A = h/2r. 

If we now prepare our beam by using a variable E-filter, we find, as we 
change our beam in a continuous manner from R to L, that the apparent 
component of angular momentum in the direction of propagation absorbed 
with each photon varies from +f to —/i. One interpretation would be that 


2.7 THE LINEAR AND ANGULAR MOMENTA OF LIGHT 39 


each quantum carries an intrinsic angular momentum of magnitude A whose 
orientation with the velocity can vary from zero to 180°. Thus, R-states have 
the velocity and angular momentum parallel; L-states have them anti- 
parallel; and the general E-state has some intermediate orientation. P-states 
would thus have their angular momentum vector perpendicular to the 
velocity and the plane determined by the angular momentum and the 
velocity could determine the plane of P-polarization. The only thing we 
shall presently find correct about this interpretation is the interpretation of 
R- and L-states. 

Let us recall that if a P-state hits an R-filter followed by an L-filter, we 
will find half of the quanta being absorbed in the R and the remaining half 
absorbed in the final L. Yet we cannot think of the P-state itself as being a 
random combination of A's and L’s, for then half of the P-states would be 
absorbed in a second P-filter. We have seen that we can write 


|P» = a|R» + BIL, 


and we must again attempt to understand the nature of this coherent super- 
position. Recalling our interferometer experiment, we interpreted the 
superposition of two states over the two possible paths not to mean so many 
photons coming along each path, but to indicate the number we would 
receive over each path if we forced the energy to appear in either one path 
or the other. We said that the photon observed with both paths open did 
not come by one or the other but came through the whole apparatus. The 
particle nature of light, which is sometimes implied by the use of the word 
“photon”, is only valid if we force the light to appear in some particular 
place. If we ask the light where it is, it will answer us by appearing some- 
where. Less anthropomorphically, light can obviously be detected only 
where we have placed a detector. The location of a photon is meaningful 
with any sort of rigor only during the process of absorption or emission. To 
apply this same interpretation to our polarization states and their interpreta- 
tion in terms of angular momentum, we will say the following: To ask for 
the orientation of the spin angular momentum of a photon is meaningless 
unless we carry out an experiment to determine that orientation. When such 
experiments are performed, the spin angular momentum of an individual 
photon is always found to be either parallel or antiparallel to its velocity. 

Thus, representing elliptically polarized light as a coherent superposition 
of R- and L-states as 


|E» = a|R> + BIL 


tells us through the constants a and P the relative probability of the photon’s 
appearing with / units of angular momentum along the direction of its 
motion or with Å units opposed to this direction. 

Our interpretation of R- and L-states in terms of angular momentum 


40 THE EXPERIMENTAL EVIDENCE 2.8 


allows us to represent these states by certain geometric representations which 
have basically the same spatial symmetry. Angular momentum is essentially 
a quantity which defines in space both a direction and a sense of handedness: 
that is, a rotation about a certain direction and a rotation in a certain sense, 
either clockwise or counterclockwise. We could thus represent these states 
as is shown in Fig. 2.8(a). 

As for all entities which possess a “handedness”, it is well known that 
reflection in a mirror changes the handedness from right to left or vice versa, 
as shown in Fig. 2.8(b). If we now shine a beam of light through an R-filter, 
allow it to reflect from a mirror, and back to an R-filter, we find that it is 
completely absorbed. If we let it reflect back to an L-filter, it is completely 
transmitted. Reflection interchanges A- and L-stages, as shown in Fig. 
2.8(c). 


2.8 ANISOTROPIC RETARDERS 


Substances may be classified in a variety of ways: as conductors-insulators, 
gases-liquids-solids, etc. For a given macroscopic property materials may 
be classified as being homogeneous or inhomogeneous, and isotropic or 
anisotropic. If a given property is the same everywhere in the material, then 
the substance is said to be homogeneous relative to the given property. A 
substance is said to be isotropic relative to a given property if at any point 
within the substance the given property is independent of direction. As an 
example, consider a wooden board. If there are knot-holes, the board is 
inhomogeneous: its properties differ from one spot to another. On the 
other hand, even without knot-holes, the board is anisotropic. It is easier 
to break it along the direction of the wood grain than it is to break it across 
the grain. 

Let us first consider an optical material which is both homogeneous and 
isotropic relative to the index of refraction. Ordinary glass is an example of 
such a material. Consider a plate of glass in air with a narrow beam of 
monochromatic light striking the glass at an angle of incidence, «. Since the 
refracted ray is rectilinear, the index of refraction for each point along the 
ray is the same. If the index of refraction were not the same for each point, 
the refracted ray would curve. By varying the angle of incidence and the 
horizontal position of the plate we see that there are an infinite number of 
possible refracted rays, and for all of these possibilities we get the same 
number for the index of refraction. 

This means that the index of refraction is the same at each and every 
point in-the glass and that the glass is homogeneous relative to the index of 
refraction. The glass is also isotropic relative to the index of refraction, since 
the index of refraction is independent of the direction of the refracted ray 
which passes through. Since the index of refraction is n = c/v, where v is the 


IR> IL? 


IR» 


Mirror 


rey RE 
» D 


(c) 


Fig. 2.8. Geometric symmetries of R- and L-states. 


42 THE EXPERIMENTAL EVIDENCE 2.8 


Fig. 2.9. Light incident on a birefringent crystal. The optic axis of the crystal is 
parallel to the plane of the paper. Hatch marks indicate that the E-ray is a P-state 
whose plane of polarization is in the plane of the paper. The dots on the O-ray 
indicate a P-state whose line is perpendicular to the paper. 


speed of light in the refracting medium, the speed of light in the glass is the 
same everywhere and is independent of the direction of the refracted ray. 

There do exist materials which are homogeneous, but anisotropic, 
relative to the index of refraction. Such materials are said to be birefringent. 
The index of refraction is not a function of position within the substance 
but is determined by the direction and polarization state of the refracted 
light. A typical example of such a material is calcite, and in Fig. 2.9 is shown 
a narrow beam of unpolarized light striking one face of a calcite crystal. In 
this case the phenomenon of double refraction occurs. The original beam 
splits into two beams, which are labeled O and E. If analyzed with a P-filter, 
these beams are found to be in orthogonal P-states. In crystal optics the 
O-ray (ordinary ray) is defined as the ray which obeys Snell's law, and the 
E-ray (extraordinary ray) as that which does not. Thus, at normal incidence 
the O-ray will not be deviated and the E-ray in general will. Now, it is 
possible to cut the crystal in such a way that neither the O- or E-ray is 
deviated at normal incidence and thereafter travels through the crystal at the 
same speed. This direction defines the optic axis of the crystal. If the crystal 
has only one such axis, it is said to be uniaxial or monoaxial. Calcite is a 
uniaxial crystal. If the crystal is cut so that at normal incidence the light 
strikes at right angles to the optic axis, neither the O- or E-ray is deviated, 
but in this case the speeds of the rays are different. 

In defining the lines of P-states, we have until now been somewhat 
arbitrary. A line was placed on one P-filter and the lines of all the others 
were drawn in such a way that any pair of filters would obey Malus' law. 
We now redraw all such lines on our P-filters in such a way that the line of 


2.9 POLARIZATION BY ABSORPTION 43 


the P-state which is the O-ray is always perpendicular to the optic axis of a 
calcite crystal. This standardizes our P-filters with all other P-filters. For 
any angle of incidence, the polarization of the O-ray is always such that the 
line of its P-state is perpendicular to the optic axis. 

With the remarks above in mind let us now see how anisotropic retarders 
may be constructed from uniaxial crystals. If a flat plate is cut from a 
uniaxial crystal in such a way that the optic axis is parallel to the flat faces, 
and light in a P-state impinges on this plate normal to the faces, the P-state's 
line can make any angle with the optic axis from 0 to 90°. Since the trans- 
mission through the plate is not along the optic axis, the speed is dependent 
on what P-state is used. However, if the line of the P-state is either parallel 
or perpendicular to the optic axis, the state is transmitted unchanged but the 
speeds of transmission are different in the two cases. Let n, and n, be the 
indices of refraction for the P-states whose lines are parallel and perpen- 
dicular to the optic axis. Uniaxial crystals are classified as positive or 
negative according to whether n, < n, or n, > n. Calcite, for example, is 
a negative crystal with n, — 1.486 and n, — 1.658 for monochromatic light 
of wavelength 2 = 5893 A. The important point is that the P-state parallel 
to the optic axis will have a retardation different from that of a P-state 
perpendicular to the optic axis. In either case the index of refraction for a 
P-state whose line is along one of these two directions will be less than that 
along the other. The line of this P-state will be called hereafter the fast axis 
and the other the slow axis of the plate. 


2.9 POLARIZATION BY ABSORPTION 


The birefringent materials discussed in the previous section were all assumed 
to be transparent. Although a polarization state may be modified by an 
anisotropic substance, the total incident energy flux was assumed to be trans- 
mitted. A birefringent crystal which has anisotropic absorption properties 
is tourmaline. If a sufficiently thick plate is cut from a tourmaline crystal 
such that its broad faces are parallel to the optic axis, the E-ray is transmitted 
but the O-ray is partially absorbed. This phenomenon is known as dichroism. 

A common and inexpensive material which polarizes by absorption is 
known by the name of Polaroid. Sheets of Polaroid can be used to construct 
a device which approximates the P-filter. If light in a P-state is incident on 
a Polaroid sheet, rotation of the sheet about the incident beam direction gives 
a maximum and minimum transmission. 

In an absorption polarizer the absorption of light other than the desired 
P-state is never perfect, and some of the desired P-state will be absorbed. 
In a later chapter the generalization of Malus’ law for a pair of identical 
dichroic polarizers will be found to be 


I = Hy cos? 0 + Hyo sin? 6. (2.5) 


44 THE EXPERIMENTAL EVIDENCE 2.10 


Hy and Hs, may be written as 
Ho = (ki + k3), 
Hag = kk. 


Here k, and k, are known as the major and minor principal transmittances, 
respectively. Values of these transmittances are given in Table 2.1 as a 
function of wavelength for three commercial types of Polaroid. 


2.440 OPTICAL ACTIVITY 


In addition to birefringence there is the interesting phenomenon known as 
optical activity. Materials exhibiting this property may be isotropic and 
homogeneous, but as light in a P-state travels through the material the line 
of the P-state will rotate about the beam direction. In a later chapter optical 
activity will be seen to result from different velocities for R- and L-states. 
The amount of rotation depends on the thickness of the material and, in the 
case of solutions, also on the concentration. Common table sugar is optically 
active. In fact, a common test for the purity of sugar is based on the amount 
of rotation produced by a standard length of sugar solution. 


Table 2.1 
Values of the Principal Transmittances k, and kz of Polaroid Type HN Polarizers 


Wes HN22 HN32 HN38 
length ky ka kı ke kı ka 
4000 0.21 0.00001 0.47 0.003 0.67 0.04 
5000 0.55 0.000002 0.75 0.00005 0.86 0.005 
6000 0.43 0.000002 0.67 0.00002 0.79 0.0003 


7000 0.59 0.000003 0.77 0.00003 0.86 0.0007 


The values express transmittance as a pure number, not a percentage. 


PROBLEMS 


2. A beam of light in an R-state is incident on a totally absorbing disk which is 
free to rotate about an axis parallel to the beam. The mass of the disk is m, its 
specific heat is C, and its moment of inertia about the axis of rotation is J. Starting 
from rest the disk acquires an angular velocity Q. Show that the temperature 
increase AT of the disk in this time is given by 


Q 2 Q 
ATIS E zs S} 
A 2 


where 2 is the wavelength of the incident light, and c is the velocity of light. 


PROBLEMS 45 


2.2 A beam of light in an L-state passes through a P-filter which is free to rotate 
about the axis of the beam. Why would you expect the P-filter to rotate? In what 
direction will it rotate? 


Answer: Counter-clockwise when viewed from the source. 


2.3 A calcite plate is cut with the optic axis parallel to the plane of the plate. 
How thick must the plate be to retard the O-ray one quarter-wavelength in respect 
to the E-ray if the wavelength of the light is 5893 À? 


Answer: 8.57 x 1074 mm. 


2.4 The average number of photons absorbed in 1 sec by a counter placed in a 
monoenergetic beam is N. If the counter now moves with velocity V in the direction 
of the beam's propagation, what average number will be counted? Where did the 
extra photons "go"? 


Answer: N' = (1 — VON. 


2.5 A photon of wavelength 4 is absorbed in a dectector of mass m, which is at 
rest but free to move. Show that the fraction of the incident photon's energy 
which is not available for increasing the dectector's temperature is 1//mc. Com- 
pute this fraction for 4 = 5000 Aandm=1 kg. 


Answer: 2.21 x 10798, 


2.6 Repeat Problem 2.5 for a detector moving with velocity + V in the direction 
of the beam's propagation. If the increase in temperature of the detector is used to 
measure the energy of the photons and, hence, their wavelength, show that approxi- 


mately 
À P 4 
A ( = [3 : 


where A is the wavelength measured when the detector is at rest and 4’ is the 
measured wavelength when it is moving. The observer moving with the detector 
assumes that all of the photon's energy is available for increasing the temperature. 


2.7 As shown in Fig. 2.3, what pure state would be detected in each segment of 
path along both arms of the interferometer? What would be the consequence of 
moving the L-filter in the interferometer from the segment MM; to the segment 
M,M? If this were done, where would the R-filter have to be placed to restore the 
original results ? 


2.8 Prove that the optical path, nd, is the distance light would have traveled 
in vacuo during the time in which it travels the distance d in a medium of refractive 
index n. 


2.9 Design an interferometer which could be used in place of the Mach-Zehnder 
interferometer, using two full-silvered mirrors but only one half-silvered mirror. 
Utilize orthogonal P-filters instead of R- and L-filters in the arms of the interfero- 
meter. 


46 THE EXPERIMENTAL EVIDENCE 


2.10 When the index of refraction is not constant over the path length of a ray, 
the optical path length is defined by 


B 
L= n dl, 
A 


where A and B represent two points on the ray. Geometrical optics may be derived 
from Fermat’s principle, which states that of all possible paths from 4 to B a 
light ray will travel that path for which L is an extremum. For our purposes this 
means that L is either larger or smaller on this path than it is on all nearby paths. 
Derive Snell's law and the law of reflection from Fermat's principle. 


CHAPTER 3 


THE REPRESENTATION AND 
PRODUCTION OF STATES 


Once it has been observed that the expression | E» = a|R> + b|L> is similar 
to that encountered in the addition of vectors in a two-dimensional vector 
space, it may be desirable to set down a set of axioms which the elements of 
any vector space must satisfy. If we did this, the physical interpretation of 
many of the axioms in terms of polarized states would be obvious. However, 
since the interpretation of others would be impossible at this point, we will 
not take this approach. Instead, we will assume that polarization states can 
be represented by two-dimensional complex vectors and proceed to develop 
a self-consistent calculus of polarization states. f 


3.1 THE VECTOR AND MATRIX REPRESENTATION 
Vectors are written in many texts as 
V = Ae, + Be; + Ces, 


where A, B and C are the vector components and the e, are orthogonal unit 
vectors. There is no special reason, aside from uniformity, why vectors must 
be written in this fashion. So long as the order of the components is specified 
and maintained, the vector V can be written as 


A 
B or (A, B, C). 
C 


These representations are, respectively, called column (3 x 1-matrix) and 
row (1 x 3-matrix) vectors. We will remove the restrictions imposed by our 
initial choice of a vector in three-dimensional Cartesian space. First, we can 
define an n-dimensional vector space, where n is any integer. Ordinarily, 
this means that a vector in this space can be expressed as a linear combination 
of n unit orthogonal vectors. Second, the components of a vector will not 
be limited to the domain of real numbers; they now may be complex. 
47 


48 THE REPRESENTATION AND PRODUCTION OF STATES 3.1 


Since any two orthogonal states in superposition give another E-state, 
we will represent polarization states by two-dimensional complex vectors. 
We may choose either row or column vectors and Dirac's notation allows 
the use of both. With each state we associate not only a ket symbol |E> 
but also a symbol <E| called a bra. Ket symbols will be represented by 
column vectors and bras will be represented by row vectors. How are the 
elements in the bra to be related to those of the ket? Assume that a state E 


is represented by a ket as 
A 


Then the bra representing the same state is defined by 
<E| = (A*, BY). 


In other words, the bra vector is the complex conjugate of the transpose of 
the ket vector or its adjoint. We will see the usefulness of this definition 


shortly. 
Suppose £i and E, are two different E-states with 


e= (4). = (4) 


and let us consider the meaning of the product <E,| E2). Since the bra vector 
is a row vector and the ket is a column vector, the symbol <£,|E,> is the 
product of two matrices. The first matrix is 1 x 2, the second is 2 x 1, 
and the product will be 1 x 1 or a scalar. Multiplication yields 


<E, |E?) = ATAs sb Bf B;, 
and we note that this implies 
(EE? = LEl E)» *. 


The product of a bra on the left with a ket on the right is called a bracket 
product, and is a generalization of the familiar scalar product of vectors 
whose components are real. If | Ej» = |Ez», then the bracket product is a 
real number and is a generalization to complex vector space of the square 
of the length of a vector. If any two kets |E,> and |,» satisfy the relation 
(E,| Ez) = 0, they are said to be orthogonal. If they also satisfy the relation 


<E,|Ey> = <E] Ez» im l, 


they are said to be orthonormal. Another possible product is formed when 
the bra <E,| is multiplied from the left by the ket |E,>. This is written 
as |E.><E,|. Interpreting this as a matrix multiplication, we have a 


32 THE BRACKEY PRODUCT AND PROBABILITY AMPLITUDES 49 


2 x I-matrix on the left and a 1 x 2-matrix on the right. This product 
yields the 2 x 2-matrix 


EE = ( 


AAY 4.8%) 
B,Af B,Bt] 


For future use we observe that this matrix satisfies the relation 
|E2><E,| = {|E XEF, 


where the dagger symbol indicates the adjoint of |E,><£,]. 

Since the bracket product, (£,|E,>, is a scalar, it may be freely moved 
through a matrix expression. Thus, for example, for any three kets |Z,>, 
| Ez», and | E» we may write 


CE, ES |Es? = |E) <E] E>. 
But, since matrix multiplication is associative, this yields 
CE,| Ez) | E> xd | Eg CEi|| E25. 


This manipulative trick will be frequently used. 


3.2 THE BRACKET PRODUCT AND PROBABILITY AMPLITUDES 


We have seen in Chapter 2 that a correspondence can be made between kets 
and states of light, and the problem arises as to how the bracket product 
can be interpreted in view of this correspondence. The correspondence 
between kets and states of light was inferred from the interferometer experi- 
ments, which indicated that any E-state can be expressed as 


|E> = a|R» + BIL, (3.1) 


where a and b may both be complex. Since the kets | R» and |L> are column 
vectors, we are free to choose any pair of column vectors to represent the 
R- and L-states, provided that our choice reflects the fact that these states 
are mutually exclusive. This situation is the same as that for real vectors, 
where any vector with real components can be expressed as a sum of three 
orthogonal vectors. In real space unit vectors along the x-, y-, and z-axes 
are linearly independent; not one of them can be expressed as a vector sum 
of the other two. In this sense the unit vectors are mutually exclusive. But 
the ket vectors have this property; there is no way of obtaining an R-state 
through a superposition of L-states. 

Of the many column vectors which satisfy the above requirements, the 
simplest we can think of is the choice 


-() »- (1) 


50 THE REPRESENTATION AND PRODUCTION OF STATES 3.2 


which allows us to write the ket [E» = a|R> + b|L> as 


= 


We emphasize that this particular choice of a vector representation is not 
unique or forced on us by physical requirements. It is merely the simplest 
choice. Later we shall find other representations for |R> and |L> which are 


convenient for certain problems. To assign the unit vectors (o) and (3) to 


two particular mutually exclusive states is known as picking a basis for 
representing polarization states. 

Four possible bracket products can be written with the pair of kets 
|R> and |L>, namely, 


<RIRY = <LILy = 1, 
<R|L> = <L|R> = 0. 


What operational significance can these products have in terms of our 
experiments in the preceding chapter? 

The following interpretation is made. The ket involved in the product 
is a statement that light has been prepared in either a pure R- or a pure 
L-state. The bra in the product indicates the filter used to analyze the 
prepared state. For example, the product <R|R> indicates that light in an 
R-state is analyzed by an R-filter. Light in an R-state analyzed by an R-filter 
is completely transmitted as an R-state, and <R|R> = 1 means that the 
probability of transmission of light in an R-state through an R-filter is unity; 
the outcome of the experiment is completely predictable. Similarly, 
<L|R> = 0 says that the probability of transmission of an R-state through 
an L-filter is zero. Another way this may be phrased is to say that if light is 
known to be in an R-state, the probability that it is in an R-state is unity; 
the probability that the R-state is in its orthogonal state is zero. It is im- 
portant to recognize that the bracket products listed above possess a common 
essential feature, i.e., if a beam of light is known to be in a pure polarization 
state, then analysis of this beam has a completely predictable outcome only 
with filters which are identical to that which produced the state or to its 
mutually exclusive partner. Only when this condition is fulfilled is the interpre- 
tation of the bracket product as a probability consistent with experiment. Since 
the probability that any E-state will pass through the E-filter that creates it 
is unity, we are consistent in placing <E|E> = 1 for any pure state |E). 

Let us investigate the implications of this interpretation of the bracket 
product for the superposition in Eq. (3.1). The light emerging from the 
interferometer in Fig. 2.3 is in the state |E>, and analysis of this state by a 
filter in relation to which it is pure has a completely predictable outcome. 


3:2 THE BRACKET PRODUCT AND PROBABILITY AMPLITUDES 51 


By taking the bracket product of |E» in Eq. (3.1) with |R» and |L», we 
obtain 


CRIED =a, <LIED =b. 
Thus, 
|E> = <RIED|R> + <LIED|L), 
and 
<E|E> = <R|E><E|R> + <L|ED<E|L> 
= KRI E>]? + |<L|E>|? 
= la? + [Ol 


If p, and p, are the probabilities of transmission of the state |E> through an 
R- and an L-filter, respectively, we know from Eq. (2.2) that p, + pa = 1 
and we may make the correspondence 


Pi = |<RIED?, po = |<L|ED|*, 


since <E|E> = 1. In other words, the probability of transmission of the 
state | E> through an R-filter is not the bracket product <R| E> but is |(R|E>|?. 
Since bracket products do not generally give probabilities, they are called 
probability amplitudes or simply amplitudes. The product <R|E>, for 
example, is the amplitude for an E-state to be transmitted through an R-filter. 

A method of measurement for these probabilities is suggested by the 
very meaning of the amplitudes. Let the intensity of the light emergent from 
the interferometer be measured as J). By placing an R-filter in the emergent 
beam, a measurement of the intensity of the light transmitted by the R-filter 
can be made, and a similar measurement is possible with an L-filter. Since 
the amplitudes for the transmission of an E-state through an R- or an 
L-filter are, respectively, (R| E> and <L|E>, then the probabilities for such 
transmission are 


Lll = |<R\E> 


^o Bh = |<L|E>|?, 


where J, and T, are the light intensities transmitted by the R- and L-filters. 

Since the intensity measurements correspond to the squares of the abso- 
lute values of the amplitudes, such measurements cannot distinguish the 
amplitudes (R|E» and e'® CR[E». In other words, the phases, $, of these 
amplitudes cannot be determined. It is possible, however, to have apparatus 
in which amplitudes may interfere such that phase differences between 
amplitudes become very important. Interfering amplitudes will be of impor- 
tance in Chapter 6. 

A very important amplitude is that given by the bracket product 
<P’|P>. From Eq. (2.1) we know that the probability that a P-state will 
pass through a P’-filter is given by 


P = T{Tmax. cos? 0, 


52 THE REPRESENTATION AND PRODUCTION OF STATES 32. 


where 6 is the angle between the line of the P-filter which prepared the 
P-state and the line of the P'-filter. From the previous discussion we may set 


<P'|P> = e* cos 0. 


But we have no way of evaluating the phase $. However, as will become 
apparent, if we arbitrarily fix this phase, the phases in all other bracket 
products will be determined. Without loss of generality we will therefore 
set ġ = 0 and write 


<P'|P> = (PIP) = cos 0. ee 


In an effort to correlate the ket vector representation and bracket 
products with the observable polarization states, we have relied almost 
entirely on the interferometer. We will now discuss a device which has just the 
opposite effect to that of an interferometer. Instead of combining light beams 
to produce a polarization state, this device resolves or analyzes light in a 
given polarization state into its components. Our device splits the incident 
beam into R- and L-states. We want to utilize this apparatus shown in Fig. 
3.1 to give another illustration of how amplitudes are correlated with experi- 
ment. It will be assumed that the device which resolves light into its com- 
ponents completely transmits light, and it will simply be called a "resolver". 

Light coming from the left strikes the E-filter and is transmitted as a 
prepared E-state, which traverses the resolver and is split into two beams 
labeled 1 and 2 which are in states | R» and |L>, respectively. D, and D; are 
identical photon counters which are polarization-insensitive. 

When the incident light is very weak, both counters never click simul- 
taneously; the light coming out of the resolver is detected in either an R- or 
an L-state. As before, we can express the light prepared in the E-state as a 
superposition 


|E» = a|R» + b|L>, 
where the bracket products are related by 
<E|E> = |<RIED|? + |<L|E>|?. 


The light incident on the resolver is known to be in the state | E» and the 
probability that the light is detected in an R-state is |(R| E>|?.. The probability 


E-filter 


Fig. 3.1. Schematic representation of a resolver. 


3.3 THE SUPERPOSITION OF P-STATES 53 


of detecting the light in an L-state is |CL|E»|?. If counts are taken for a 
sufficiently long time, the number of counts registered by D, is proportional 
to |CR[E5|* and those recorded by D; over the same interval are propor- 
tional to (L|E»|?. Since the light is prepared in an E-state, a measure of 
<E|E> can be obtained by removing the resolver and using either counter to 
register the number of counts in the same interval. This last step would 
not be necessary if we continued to use the convention (E|E» = 1. How- 
ever, this much rigidity is not always desirable. 

There are a number of alternative interpretations of the bracket product 
which are intimately related. While keeping in mind our remarks about the 
dangers of regarding light as discrete, we can think of |CR|E»|? as giving 
the probability of a single photon in an E-state traversing an R-filter. We 
can also think of a beam passing an initial E-filter such that N photons per 
second pass the filter. Then we could write VN|E> = |E’), so that (E'|E^» 
— N. A similar interpretation is likewise possible with the intensity of a 
beam J, such that (E"|E"» = I, where |E” = VI|ED. In general, the 
bracket product of a state with itself can have many related meanings 
Exactly which meaning is intended will either be explicitly stated or evident 
from the context. This procedure of multiplying a bra or ket by a real 
constant to make <E|E> fit a particular interpretation is known as 
normalization. 


3.3 THE SUPERPOSITION OF P-STATES 


Until now we have written E-states as a coherent superposition of the 
orthogonal kets |R> and |L> only. That this is not the only possibility was 
discussed in Section 2.5, and in what follows we derive the general standard 
form for a state expressed as the superposition of any pair of orthogonal 
states. The interpretation of the bracket product in the preceding section 
will then permit an analysis of a state which is expressed as the superposition 
of orthogonal P-states. 

When a polarization state is written as a superposition of P-states with 


the basis 
1 0 
mo=() m-Q. 


the vector representing the state is called the Jones vector and the manipula- 
tion of these vectors is called the Jones calculus. However, we shall refer 
to any two-dimensional complex representation of a polarization state as a 
Jones vector and to the manipulation of Jones vectors as the Jones calculus. 

Now, let |£,> and |E,> be any two orthogonal states and superimpose 
them as 


|E> = A|E,> + BIE, 


54 THE REPRESENTATION AND PRODUCTION OF STATES 3.3 


where the constants 4 and B may be complex and can be written in the 
form 


A = ae" B= be’, 
with a, b, y, and ¢ being real. This superposition can be rewritten as 
|E» = e"(ae- | E,» + be* | E}, (3.3) 
where 
z—ivt$. B-ó-w 


In any experiment in which the state |E» is to be detected in another pure 
state | E^», the phase factor e'* will vanish in |<E|E’>|?. For this reason the 
factor e'* is often neglected and the standard form for the coherent super- 
position of two orthogonal states is that of Eq. (3.3) with e'* neglected. 

Let |P;» and |P,> designate pure states which have been prepared by 
P-filters whose lines are along the x- and y-axes. When light in the state 
|P;» is analyzed by a P-filter, it is completely transmitted when the line of 
the filter is along the x-axis. Similarly, the state |P,> is pure in respect to 
a P-filter whose line is along the y-axis. Using these two states as a basis, 
we may express any E-state as 


|E» = a,e7**?|P,> + ae * 2| Pi. 
Using an intensity normalization for this state, we obtain 
h = <E|E> = a2 + a}. 


In what follows we will identify particular E-states, e.g., R- and L-states, 
with specific choices of ar, a;, and ¢. 

We now analyze the state |E» by using a P-filter whose line is oriented 
at angle 0 with the x-axis and, hence, at angle (7/2 — 0) with the y-axis. 
We obtain 


<P|E> = ae '?PCP|P5 + aye* * P|P,, (3.4) 
and, using Eq. (3.2), 


<P|E> = a;e7 1?? cos 0 + a,e*'*!? cos (7/2 — 0) 


= q,e7 1?? cos 0 + a,e* 1? sin 0. 


First consider the special case $ = 0 and compute the intensity trans- 
mitted by the P-filter: 


I = KP|E»|? = (a, cos 0 + a, sin 0}. (3.5) 


3.3 THE SUPERPOSITION OF P-STATES 35 


As we would expect, this is highly dependent on 0. An insight can be obtained 
by looking for maxima and minima of |<P|£>|?. Setting d//d0 = 0 yields 
the solutions 
tan 0, = —a,/a,, (3.6) 
cot 0, = a,/a,. (3.7) 


Investigation of d?//d6? shows that Eq. (3.6) yields a minimum and Eq. (3.7) 
a maximum. To find the maximum and minimum transmissions, we place 
the values given by these equations into Eq. (3.5) and obtain 

(a, cos 0, + a, sin 0,)? = 0, 

(a, cos Oy + a, sin 05)? = a? + a? = h. 
Since tan 0, = —cot 62, 0, and 0, must define two mutually perpendicular 
directions. Along the 6;-direction the transmission through a P-filter is zero 
and along 0, it is equal to the incident intensity. We thus conclude that 
|E> is another P-state with its line oriented at angle 6, with the x-axis, 
whenever in standard form ¢ = 0. 

To see this more clearly, let us transform our reference system to an 
x’-axis in the 0;-direction, as shown in Fig. 3.2, where 6’ and 0 are the angles 
which the P-filter makes with the x’- and x-axes, respectively. Equation (3.5) 
becomes 


I = [a, cos (0' + 05) + a, sin (0' + 6,)]?, 
which reduces to 
I = (a? + a2) cos? 0' 


= h cos? @’. 


D> 


82 
a 


Fig. 3.2. Geometry for Malus’ law. The line of the incident state is along the x’- 
axis. The line of the P-filter is along the P-axis. The transmitted intensity obeys 
I — Ip cos? 0. 


56 THE REPRESENTATION AND PRODUCTION OF STATES 3.3 


Since the transmitted intensity obeys Malus' law, this completely identi- 
fies |E» as a pure P-state whose line is oriented along 0;. Consequently, 
any P-state can be written as a superposition 


|P» = a, P.» t a,|Py>s (3.8) 


where a, and a, are real. The line of this P-state makes an angle 
0 = tan` (a[a;) with the x-axis. 

Let us now consider the general case with ¢ + 0. From Eqs. (3.4) and 
(3.5) we have 


<P|E> = a, ?? cos 6 + aye* ^? sin 0, 


|<P|E>|? = a} cos? 0 + d, sin? 0 + a,a, cos 0 sin 0(e' + e^). 
But e? + e^? = 2 cos ġ; thus, 
|<P|E>|? = a? cos? 0 + a sin? 0 + 2a;a, cos 0 sin 0 cos $, 


which, of course, agrees with the previous result when $ = 0. 

Following the development we have given, this equation should indicate 
the transmission of a general elliptical state | E» through a P-filter. Particular 
cases of E-states are R- and L-states. 

Thus, some particular choices of a;, ay, and ¢ should yield the representa- 
tions for R- and L-states. The test for circular polarization is that trans- 
mission through a P-filter must be independent of the orientation of the 
P-filter, 0. We proceed as follows: first we find the maximum and minimum 
of |<P|E>|? as a function of 6, then demand that the values of a,, a,, and ¢ 
which yield maximum and minimum values be independent of 0; that is, 
we find those values for which every value of 0 yields both a maximum and 
a minimum, or for which |<P|£>|? as a function of 0 is constant. Taking the 
derivative, we find 

0 ! P 
ET |(P|E>|? = —2a? cos 6 sin 0 + 2a sin 0 cos 0 
+ 2a,a, cos? 0 cos $ — 2a,a, sin? 6 cos ¢. 


Setting this equal to zero to find the maxima and minima yields 


(2-5) + (cot 0 — tan 0) cos¢ = 0. 


a, ay / 


For this to be independent of 0, we must have cos ġ = 0, with $ = +7/2. 
If cos ġ = 0, then a, must equal a,, and in order to normalize the state to 
unity if a probability interpretation is desired, we place a, = a, = 1/v2. 
We shall pick a convention in which the choice $ = +7/2 corresponds to 


3.3 THE SUPERPOSITION OF P-STATES 57 


Fig. 3.3. The probability amplitude |<P|Z>| as a function of the orientation, 0, of 
a P-filter. 


a right circular state and 6 = —7/2 corresponds to a left circular state. 
Thus, 


l l js — 
|R> = aie FD + etini PS}, 


i (3.9) 
Ly a etin Pj + e in Py» , 
| ae | [P5] 


The general case of elliptical polarization is not quite so easy when 
4, +a, and $ may take on any value in the range 0 < $ <n/2. Fora 
typical E-state a plot of |(P|E>| as a function of 0 is given in Fig. 3.3. It is 
to be emphasized that |<P|E>| is not the ellipse which is shown but is the 
rather strange curve which touches the ellipse at the four points of contact. 
The relation between the ellipse and the curve is illustrated at the point p. 
The value of |CP|E»| in a direction 6 is the projection of the ellipse along 
that direction. Thus, calling E-states elliptical is not an obvious choice. 

Why |<P|£>| should be obtainable from an ellipse in this fashion and 
the physical interpretation of the ellipse itself will be delayed until the 
electromagnetic or wave interpretation is discussed in a later chapter. At 
the moment the problem is to relate the values of 0 at which |<P|E>| takes 


58 THE REPRESENTATION AND PRODUCTION OF STATES 3.3 


on its maximum and minimum values to a, ay, and $. Returning to 
(€/60)|<P|E>|?, we have 


(a,/a, — @,/ay) + (cot 0 — tan 0) cos ó = 0. 
Setting tan R = a,/a, gives 
—(cot R — tan R) + (cot 0 — tan 0) cos 9 = 0. 
But by a trigonometric identity 


cot « — tan « = 2 cot 2a, 


so that 
—2 cot 2R + 2 cot 20 cos $ = 0, 
or 
0 = 4 tan`! [tan 2R cos 4], 


which has two solutions separated by 90°. By considering the sign of 
(82/062)|(P|E>|? the maximum transmission direction is found to be that 
direction which lies closest to the x- or y-axis, according as a, > ay, Or 
a, c ay. If à = 7/2, cos $ = 0 and 


Omax..min. = 1/2 tan! (0) = 0, 7/2, 


so that the maximum and minimum transmissions occur at 0 and 7/2. Thus, 
an E-state with maximum and minimum directions along the x- and y-axes 
can always be written as 


|E> = ae 7^ |P,5 + aye "^| P,. 


We have shown that the directions of the maximum and minimum trans- 
mission of an E-state through a P-filter are always separated by 90°. Thus, 
different E-states having the same maximum and minimum transmissions 
differ (apart from an arbitrary phase factor) only in the orientation of these 
directions and in the handedness of the state. Without a formal trigono- 
metric proof we can therefore say that for any arbitrary E-state 


|E> = a,e !?| p,» + aye * "?|P,», 


expressed in a particular coordinate system x, y, there also exists a coordinate 
system x’, y’ in which | E» can be written 


|E» = aye 7" |P, + aje* Pu». (3.10) 


A simple proof of this statement will be given after the Poincaré sphere has 
been discussed in Chapter 4. It will also be shown there that for any state 
|E» there exists a coordinate system x”, y" in which |E> may be written 


1 E ; 
E> = "d tel pus + etit P o}. (3.11) 


3.4 RETARDATION PLATES AND THEIR MATRIX REPRESENTATION 59 


It is the $ of Eq. (3.11) which is used to label the states in Fig. 2.4. By 
analogy with the circular case the plus and minus signs refer to right and left 
elliptical polarizations. 


3.4 RETARDATION PLATES AND THEIR MATRIX REPRESENTATION 


The derivation in the preceding section of the representation for R- and 
L-states as a superposition of orthogonal P-states is based on the requirement 
that the intensity of the light transmitted by the P-filter be independent of the 
orientation of the filter. This requirement leads to the particular exponen- 
tial phase factor in Eq. (3.9). In this section the physical significance of 
phase factors will be examined in terms of retarders, and it will be shown 
how retarders may be described by matrices. The primary motivation in 
what follows below and in the next section is to show how any polarization 
state can be produced by a device called an elliptical polarizer. We shall 
find for any state |E> the filter in respect to which it is pure. 

It was seen in Chapter 2 that the effect of a retarder on a pure state 
|E» is mathematically represented by a multiplicative phase factor. Thus, if 
the state |E> passes through a glass plate, the emergent state is given by 


|E'» = e**|E>, where s = t(n — 1) is the retardation of the plate; n and t 
are the index of refraction and thickness of the plate. If the retardation is 
increased by an amount As, the emergent state is | E^» = e'*6* 9| E», 


From Section 2.4 we know that when the increase in retardation is equal 
to the wavelength 2, the ket | E"» will be the same as the ket | E'». Since the 
exponential phase factor is periodic with period 27, we must have k = 2n/A, 
where k is now defined as the wave number. As a result we can express the 
light emerging from a retarder in terms of the wavelength of the light, and 
the thickness and index of refraction of the plate in the form 


|E’ E gni - v| gs, 
Let us now insert a birefringent plate in a beam which is written as a 
coherent superposition of two P-states |P,> and |P,> as 
|E> = A|P,> + B|P,>, 
where A and B may be complex. We insert the plate so that its fast axis 
is parallel to the x-axis and its slow axis is along the y-axis. We call n, and 
n, the indices of refraction along the slow and fast directions. After passing 
through the plate each component has undergone a retardation. If the 
incident beam is a pure |P,> state, then the transmitted state is | P», given by 
|P.> = eft) pos, 
If the incident state is a pure |P,> state, then the emergent state is 


[Py = eit - D/P. 


60 THE REPRESENTATION AND PRODUCTION OF STATES 3.4 


If the incident state is the superposition given above, then the emergent state 
must be given by 


[E'S = Aet DP > + Beistts- D| P > 


= e (Ae mp) + Be P>), 613 
which may be written in standard form 
[E> = e^ "Ae "PIP + Be* 2| Pj]. (3.13) 
where 
ô = kt[n, — nj]. 


The quantity 6 in Eq. (3.13) we call the retardance of the birefringent plate. 
From Eq. (3.12) we see that the components of | E» have different retarda- 
tions, and we will call the component whose line is along the fast axis of the 
plate the fast component and the other the slow component. The difference 
in retardations, r(n, — n), is often used to classify birefringent retardation 
plates by expressing this retardation difference as a fraction of a wavelength. 
Thus, we have a half-wave plate when t(n, — n) = 4/2 and ô = ~, or a 
quarter-wave plate when t(n, — n;) = 2/4 and ò = 7/2. 

A matrix which describes a retarder can now be written without much 
effort. Let the state | E» be written in column vector representation as 


A 
|E» = A|P,) + B|P,) = (5) 
If the light in this state is incident on a retarder plate, the emergent state is 
given by 


2 Ae" 39? 
| E» T (son) 


We now wish to find a matrix operator to represent the effect of the plate. 
That is, we wish to find a matrix M such that 


|E’ = M|E». 
It is obvious from Eq. (3.13) that, ignoring the factor e^", 
e i? 0 
M= ( 0 B (3.14) 


The matrix M given above is a valid representation of the retarder only if 
the fast and slow axes of the plate are along the x- and y-axes, respectively. 


3.4 RETARDATION PLATES AND THEIR MATRIX REPRESENTATION 61 


X 


Fig. 3.4. The orientation of the fast and slow axes, x’ and y’, of a retarder plate. 


If the fast and slow axes of the plate are interchanged, it is seen that the 
correct matrix is 


M* = lu 9 ) (3.15) 


0 e 16/2 


What is the representation of the retarder when its fast axis makes an 
angle 0 with the x-axis? This representation can be obtained in the following 
way. We will rotate the retarder plate until its fast axis makes an angle 0 
with the x-axis. Let x’ and y' be the new directions of the fast and slow 
axes of the plate. We will take an arbitrary state |E» expressed in terms of 
|P,> and |P,> and express it in terms of |P,,> and |P,>. In the primed system 
the effect of the retarder, after rotation, is given by M, since the axes of the 
retarder are along the axes of the primed coordinate system. After applying 
the retarder matrix in the primed system we will then transform back to the 
original system. Looking down the beam toward the retarder, the geometry 
is as shown in Fig. 3.4. The beam direction is perpendicular to the page. 
Expressing the same state E in both systems, 


|E> = A|P,> + BIPy>, 
[E = A'|P,> + B'|P,), 
and, taking the bracket product of both equations with <P,| and <P,,|, we 
obtain 
A’ = ACP,|P,> + BKPy|P,>, 
B' = ACP, |P > + BiP,|P,>. 


From the interpretation of the bracket product we know that the 
amplitude for the detection of an unprimed P-state in a primed P-state is 


62 THE REPRESENTATION AND PRODUCTION OF STATES 3.5 


the cosine of the angle between the lines of these states. Therefore, we may 
write 


A 
p 


A cos Ü + B sin 6, 
—A sin 0 + B cos 0. 


It is readily seen that the above equations may be written in matrix form as 
|E’> = RIED, 


where the rotation matrix R (by which the components of the ket vector in 
the primed system are expressed in terms of the components of the ket 
vector in the unprimed system) is given by 


cosÜ  sin6 
R= oy cos a): B. 


The state vector in the primed system after passing through the retarder plate 
is then given by M| E’ = MR|[E». To express this in the original system we 
must transform back to that system by using R~*. This matrix, R^, repre- 
sents a rotation through an angle —9 and, hence, is equal to R with 0 replaced 
by —0. Finally, the state |E"» obtained after transmitting a state |E) 
through a retarder plate whose fast axis makes an angle 0 with the x-axis is 
given by |E") = M’|E>, where M' = R^MR. The explicit result is 


M= (e^ 9? cos? 0 + e* 9? sin? 0) —2i sin 6/2 cos 6 sin 0 3.17) 
E — Qi sin 6/2 cos 0 sin 0 (e^ 19? sin? 0 + e*?? cos? 9) 7 


As an example, consider a state |P,> incident on a quarter-wave plate 
whose fast axis is midway between the x- and y-axes. For this case ô = 7/2 
and 0 = 7/4; therefore, 


The emergent state is obtained as 


: : 0 e init e "a 
|E =M (i) = V2 [rush 


/ 


which is identified as an R-state. 


3.5 AN ELLIPTICAL POLARIZER 


We are now in a position to describe a device consisting of three elements 
which can produce any £-state. Our elliptical polarizer or E-filter simply 
consists of a P-filter placed between two retarder plates. In Fig. 3.5 is shown 
such an E-filter with the light incident from the left. The retarder plates are 


35) AN ELLIPTICAL POLARIZER 63 


Fig. 3.5. An elliptical polarizer consisting of two retarder plates, Q and Q’, and 
a P-filter. 


indicated by Q and Q', and the P-filter by P. The retarder Q is oriented 
such that its slow and fast axes are parallel to the x- and y-axes, respectively. 
The line of the P-filter makes an angle 0 with the x-axis, and the fast axis 
of Q' is parallel to the slow axis of Q. 

No matter what the state of the incident light (polarized or unpolarized), 
only light in a P-state will be transmitted by the P-filter, and we write for 
this transmitted state 


|P) = AP.» + A,|Py>, 


where A, and A, are real, and the angle the P-filter makes with the x-axis 
is given by tan Ü = A,/A,. The retardance of the plate Q' is ô, and since 
the fast axis of Q' is along the x-axis, we find by using Eq. (3.14) that the 
light emerges from Q' in the state 


|E» = M|P» 
= Ae P> + A e P), (3.18) 


where M is given by Eq. (3.14) and |E> is a right elliptical state. If left states 
are desired, Q and Q' are interchanged. If the emergent state is incident 
on an E-filter identical to that which we have described, the retarder Q of 
the second E-filter whose matrix is given by Eq. (3.15) produces the state 


|P> = M*|E> = A,|P;> + A,|Py>, 


which is transmitted completely by the P-filter. The final retarder Q’, repre- 
sented by M, gives the emergent state M|P», which is identical to that given 
by Eq. (3.18). For the particular state |E», the effect of the two retarders 
effectively cancel each other, and since the P-filter has no effect, the emergent 
state is simply the incident state. It should be evident that no E-state other 
than that given by Eq. (3.18) will be transmitted unaltered, since no other 


64 THE REPRESENTATION AND PRODUCTION OF STATES 3.6 


E-state will emerge from the first retarder as a P-state whose line is exactly 
0 = tan“! A,/A;. 

We may now write the matrix for the elliptical polarizer T. as a product 
of the matrices for the retarders and the P-filter: 


mioa- 0 cos? 0 cos 0 sin 0 (ei^? 0 
= xa (° 
Te se MTM ( 0 Jun) m 0sinÜ  sin?0 ) ( 0 n) 


cos? 0 e^? cos 0 sin 0 
e* © cos 0 sin 0 sin? 0 ? 


where M and M* are given by Eqs. (3.14) and (3.15) and T is the transmission 
matrix of the P-filter. It is left as an exercise to show that the transmission 
matrix for a P-filter whose line makes an angle 0 with the x-axis is given 
by T. 

: In practice it is possible to construct a general elliptical filter using 
quarter-wave plates rather than general retarder plates. Recall that in Section 
34 it was stated that any state such as that given by Eq. (3.18) could be 
written 


|E» = Aye ^ **!4| Pay + Aye* n Pu» 


in some coordinate system x’, y’. 

Now, it should be apparent that in this coordinate system the matrix 
for the elliptical polarizer will have the same form as it had in the old system 
if it is rewritten in terms of 6’ = tan^! 4,,/A, and ð'/2 = 1/2(m/2). The 
matrix in the (x', y’)-system is thus 

T,= ( cos? 6’ —i cos 6’ sin A 
* — Vi cos 0' sin 0' sin? 6’ ' 

From the forms of the matrices T, and T, it is seen that Te can be 
taken to represent two quarter-wave plates separated by a P-filter. The slow 
axis of the first plate and the fast axis of the second plate would lie along 
the x'-axis. The line of the P-filter would make angle 6’ with the x’-axis. 

Thus, a general elliptical polárizer may be constructed using quarter- 
wave plates and a P-filter if we allow the entire filter to be rotated about the 
axis of the beam. 


3.6 TRANSMISSION MATRICES IN OTHER BASES 


Some examples of the type of problem which will be of concern in this section 
have already been given, but these were specific instances of a rather broad 
class of transformations. The transformations in the last section could be 
visualized in terms of an actual rotation of an E-filter about the optic axis. 
In many cases the relation of the transformations with a physical rotation 
is not clear, if indeed there is such a relation. We now pause and solve the 


3.6 TRANSMISSION MATRICES IN OTHER BASES 65 


problem in more general terms. If the transmission matrix of a given device 
is known when the state of light entering the device is expressed as a super- 
position of a given pair of orthonormal kets, what is the corresponding 
transmission matrix when the entrant state is expressed in terms of a different 
set of orthonormal kets? 

In what follows, sets of orthonormal kets such as |P,>, |P,> or |R>. |L> 
will be referred to as basic kets because they provide a basis in which any 
polarization state may be represented. Also, in order to minimize notation 
difficulties, any two different sets of basic kets will be labeled |¢;> and |p;>, 
with 7 = 1,2, and the kets |y» and |¢> which are superpositions of these basic 
kets will be said to be in the y- and ¢-bases, respectively. For example, if 
the E-state in question is an R-state, and |y,> =|R>,|L>; |» = |P., |Py>, 


then 
, (T 
[R> = |p) = IT 
— u 1 / g7 i714 
R= ib = i (era): 


The matrix which transforms a ket in the y-basis to one in the ¢-basis 
is taken as U. Let |y» and |f> be two different expansions for the same 
polarization state of light incident on an optical device, and assume that the 
transmission matrix M corresponding to the y-expansion is known. We can 
then write 


ly = M\y>, (3.19) 
|p = Ulp>, (3.20) 
where a primed symbol inside a ket indicates the emergent state. 


To determine the transmission matrix M’ in the ¢-representation, we 
multiply Eq. (3.19) by U and obtain 


i» = U|y'» = UM|y», (3.21) 


and from Eq. (3.20) we have |y» = U~+|4>, which is substituted in Eq. (3.21) 
to give 


|^ = UMU- 14. 


We are assuming for the moment that U is unitary, i.e. U~! = Ut; this 
will be explicitly demonstrated later. As a result, the transmission matrix 
M' in the $-basis is 


M’ = UMU!. (3.22) 


Before proceeding to some examples of this matrix transformation, it 
is convenient to have available a powerful identity which is useful in 


66 THE REPRESENTATION AND PRODUCTION OF STATES 3.6 


transforming kets from one basis to another. Let us expand any ket |ġ> in 
terms of the basic kets |¢,>: 


id» = «^id» + <balP>|p2> 
= |d.><hi|6> + |h2><halo>, 


which can be written 


le» = È loil} I». (3.23) 


Tt is seen that the operator in the brackets leaves |¢> unchanged and acts 
as an identity operator or matrix. In the above manipulation we expanded 
in the |,» but we did not have to give explicit column vectors for these kets. 
For example, if |d;» is |R>, we have not said whether we wish to write 


$) = () 


or 


2 l fg-in'& 
= (Cems) 
Thus, Eq. (3.23) is actually independent of which basis or representation for 
that basis we use. Thus, we can write 


|E> = » [poill E». (3.24) 
t 
where no particular basis is explicitly indicated for the ket |E>. If actual 


column vectors are used in Eq. (3.24), the same representation must, of course, 
be used throughout the equation. Finally, we can identify 


1 = $ |$: (3.25) 
and if |p;> are any other set of orthonormal kets, 
l= X lpo<yil. 
t 


where | is taken to be an identity operator or identity matrix. 

We may now find the matrix U which transforms the ket |E> = |d» in 
the ¢-basis to the ket |E> = |y» of the same state written in the v-basis. 
Using the identity operator, we explicitly write 


|» = X dol». (3.26) 


lv» = X voceAE». (3.27) 
E 


3.6 TRANSMISSION MATRICES IN OTHER BASES 67 
Placing Eq. (3.27) into Eq. (3.26), we obtain 

leo — X peile yE) 

i ij 


= $ ily Elp, (3.28) 
ay 
giving 
BIE = X bly <p E» (3.29) 
I 


But <;|E> is the i-th component of | E> in the ¢-basis and <y;| E> is the j-th 
component of |E» in the q-basis. Using explicit vector representations for 
these states Eq. (3.29) allows us to write 


(RE) (Sa dave (ish 


Thus, the element U,, of the transformation matrix is 
U; = v; (3.30) 


The explicit form of U having been obtained, it is left as an exercise to 
prove that U is unitary, U* = U~', as was assumed. 
As an example, consider the retarder matrix 


/ g7 i0/2 0 
war ar) 


which corresponds to a retarder when the entrant state of light is expressed 
A and |P,> = ie What is the retarder 
matrix which must be used when the entrant light is expressed as a superposition 


of |R> = (6) and |L> = BE Let the basic kets of our -representation 


as a superposition of |P,> = ( 


be |P,>. |P,> and those of the ¢-representation be |R» and |^. The trans- 
formation matrix U becomes 


/ N N / pini — in [4^ 
eem e) = l » P ) (3.31) 


T KLP LP] va eT ent, 


where the bracket products are evaluated by use of Eq. (3.9). The calculation 
of M' in Eq. (3.22) gives the result 


"EN + _ [ coso/2 sin 2m) 
M= UMU! = (Ss 6/2 cos 6/2)" 


68 THE REPRESENTATION AND PRODUCTION OF STATES 3.7 


3.7 OPTICAL ACTIVITY 


The retarders which have been used until now possess the property of resolving 
a light beam into two orthogonal P-states and transmitting these orthogonal 
states at different speeds. Such retarders are said to be linear retarders. Just 
as linear retarders resolve beams into P-states, there exist circular retarders 
which resolve beams into R- and L-states, which are transmitted with 
different speeds. Such materials, as we have mentioned before, are said to 
be optically active. We show in what follows that different transmission 
velocities for R- and L-states result in the rotation of the line of a P-state 
transmitted through the material. 

It has been shown in Section 3.4 that the linear retarder introduces a 
phase difference between the components of a state expressed as a super- 
position of P-states. Similarly, if a state is expressed as a superposition of 
|R» and |L5, 

|E» = A|R> + BIL>. 
a phase difference between the components of the state is introduced when 
light in this state passes through an optically active substance. If this phase 
difference is f, the emergent state can be expressed as 
|E'» = Ae Ry + Bei??lL». 


The retarder matrix for the optically active material expressed in an (R, L)- 
basis is therefore 


N= Mu 0 Ji (3.32) 


0 et8!2 


By analogy with linear polarizers as discussed in Section 3.4, it can be 
seen that in the case of optical activity 


B = kt(n, — np), (3.33) 


where k is the wave number, £ is the thickness of the material, and the indices 
of refraction for the R- and L-states are ng and nz, respectively. 

Since the matrix U of Eq. (3.31) transforms from the (P,, P,)-basis to 
the (R, L)-basis, Ut = U~! transforms from the (R, L)-basis to the (P,, P,)- 
basis. 

Therefore, the matrix N expressed in the (P,, P,)-basis is given by 


N' = U'NU 


C n^ B[2 —sin BP?) 
~ \sin 8/2 cos B/2)" 


As a special case let an incident beam be the P-state, 


|P> = cos 6|P,> + sin 6|P,>, 


3.8 DICHROIC POLARIZERS 69 


where the line of the P-state is at angle 0 with the x-axis. After traversing 
the optically active material the emergent state is given by 


up. [cos (0 + B[2) 
NI = (no + Al 


which is simply a P-state whose line has rotated through an angle f/2 relative 
to the line of the entrant P-state. Thus, if the phase difference is positive, 
the rotation is in a counterclockwise sense with the beam coming toward 
an observer. If f is negative, the rotation would be in a clockwise sense. 


3.8 DICHROIC POLARIZERS 


The birefringent materials which we have been discussing so far have all 
been assumed to be transparent. Although the polarization state may have 
been modified during transmission, the total incident energy flux was trans- 
mitted and not absorbed. We will now examine the cases of isotropic and 
anisotropic absorption, and derive the equation, given in Section 2.9, which 
is a generalization of Malus’ law for imperfect polarizers. 

The simplest nontransparent case is that of an isotropic uniform absorber 
such as a piece of evenly smoked glass or a partially silvered mirror. In such 
cases a certain fraction of the light incident on the plate will be transmitted 
independent of the polarization state. The remaining fraction will either be 
absorbed within the material, thus raising its temperature, or reflected into 
another beam. It is fairly obvious that if k is the fraction of the incident 
energy which is transmitted, then the matrix which represents transmission 
through the plate is given by 


Z /k A 
[E> = Vk|E» = VE d . 
(0. VkJAB 


The interpretation of the bracket product «E'|E'» will clearly depend on the 
normalization of | E», since CE'|E'» = kCE|E». If <E|E> = 1, then k is the 
probability that a photon will be transmitted. If the normalization is such 
that the ket representing the incident beam is V/N|E» or VIED, where N 
is the number of incident photons per unit time and J, is the intensity of the 
incident beam, then the bracket product «E'|E"» is interpreted as the number 
of photons per unit time that are transmitted or the intensity of the trans- 
mitted beam. In the following we adopt a probability normalization. 
Anisotropy in materials is exhibited in absorption phenomena as well 
as in transmission. As in the case of birefringence there were fast and slow 
axes, so for linear dichroic materials there are two principal absorption 
directions. A P-state whose line is parallel to one of these directions is found 
to undergo a maximum absorption; when its line is perpendicular to this 


70 THE REPRESENTATION AND PRODUCTION OF STATES 3.8 


k, first plate 


k, second plate 


Fig. 3.6. Illustrating the geometry of the principal axes of a pair of imperfect 
polarizers. 


direction, the state suffers a minimum absorption. If we express an arbitrary 
state |E» in terms of two P-states whose directions are along the maximum 
and minimum transmission directions of a dichroic plate, the transmission 


matrix is given by 
[E dis 0 M4|. [vk 
0 Vk) AB VksB]. 


A dichroic sheet would be a perfect linear polarizer if either k, or kọ were 
to be zero and the other equal to unity. 

As an example, consider the transmission of a pure P-state through a 
pair of identical partial polarizers whose absorption axes make an angle 6 
with each other. The geometry is as shown in Fig. 3.6, and the incident 
light is perpendicular to the planes of the parallel polarizers. Let us first 
express the incident P-state in terms of |P,> and |P,> along the axis of the 
first plate. This is given by 


|P> = cos $| P> + sin d|P,>. 


Applying the transmission matrix of the first plate to this beam gives the 
state of the transmitted beam as 


|P> = Vk, cos $| P» + Vk, sin $| P >. 


We can now rewrite this in terms of |P,> and |P,> along the axes of the 
second plate by using the rotation matrix for angle 6: 


( cos 0 d Would | Vk, cos cos $ + Vka sin 0 sind | 


Vk, sin à — Vk, sin 6 cos $ + Vk, cos 0 sind 


—sinf cosé@ 


PROBLEMS 71 


Applying the transmission matrix of the second plate to this result yields the 
state of the light transmitted by both polarizers expressed in terms of P-states 
along the second plate’s axes as 


Ip = kı cos 0 cos ġ + V kik, sin 0 sind 
^^ N —vkk, sin 0 cos $ + ka cos 0 sin d J^ 


We now compute <P”|P">: 
<P"|P") = Ki cos? 0 cos? à + k3 cos? 6 sin? $ 
+ 2[k, V ak; — ks V k;ks] sin 6 cos 0 sin ¢ cos à 
+ kiko sin? 0. 


If the incident beam is a completely random incoherent superposition 
of P-states equally distributed angularly from ¢ = 0 to ¢ = v, the transmis- 
sion is given by the average of the above expression over ¢ from 0 to r. 
Since 


v 


1 [7 z L4 .. | [* 1 
ij cos $ sind db = 0, - [ sin? 9 dd = - [ cos? $ d$ = = 
m Jo m Jo m Jo 2 


we obtain 
CP"|P'average = a(k? + k3) cos? 0 + (kika) sin? 0, 


which may be considered as a generalization of Malus’ law for the transmission 
of an unpolarized beam through a pair of imperfect polarizers. Comparing 
this result with Eq. (2.5), one may obtain the major and minor transmittances 
in terms of k, and k3. 


PROBLEMS 


3.1 Given the states below, with <R|R> = <L|L> = 1, 
|E > gum 3ei7/3 |R» + 2e- iri3 IL, 
| ES» = 2e- ini? |R» + 3ei7!2 |L», 


a) normalize each state so that <£, |E) = <B£2|E.> = 1; 

b) evaluate the products (E;|E;» and |E, x E|; 

c) what is the probability that a photon in the state |E;» will be transmitted 
by the filter which produces state |E;»? 


Answer: 


1 3V3 e a 27 
413 13 13 \4e!7® 6e-t7516]?” 169 


72 THE REPRESENTATION AND PRODUCTION OF STATES 
3.2 The states |E,> and |E,> given below form an orthonormal pair: 


[ED = L [eo #79 |P,> + e *'0 [P5], 
S 


1 à , 
| Ez» = m [e37/8: P> + e 8m Bi |P,>]. 
v 


If the right circular state |R> is expanded as 
|R> = A|E» + B|EZ, 
find the constants A and B. 


Answer: cos 7/8, —sin 7/8. 


3.3 What is the probability that the left circular state |L> will be transmitted by a 
filter which always transmits state |F;» of Problem 3.2 and never transmits state 
| Ez» of that problem. 


Answer: cos? 37/8. 


3.4 Ifthe states |P,> and |P,» are used asa basis, show that the transmission matrix 
for a P-filter whose line makes an angle 0 with the x-axis is given by 


cos? 0 cos sin s) 
cos Üsin 0. sin2 0) 


3.5 A matrix which is often useful is the operator [E> (E|/<E|E>. Show that 
when it operates on the ket |E>, the result is the identical ket |E>. If the states 
| Ei» and |Ez» are an orthonormal pair, show that the matrix of the filter which 
produces |£;> is given by |E;» Ei]. If |E> is any polarization state incident on the 
filter which produces state |E;» show that the probability that |E» will be trans- 
mitted by the filter is given by 


|<E,|E>|? = <E|M|E>, 
where M = |E > <E]. 


3.6 If the matrix N corresponding to a polarizing device is unitary, N'N = 1, 
what does this imply about the transmission properties of N? Is the matrix for a 
polarizing filter unitary? Is the matrix of a retarder plate unitary? 


3.7 The intensity of a state 
|E> = A;e 7 1*? P> + Aye 192 | P,» 


is given by Jo = <E|E> = A? + A2. What is the intensity of this state after trans- 
mission through a half-wave plate whose slow axis is along the x-axis. What is the 
intensity transmitted by a P-filter whose line is at angle 9 with the x-axis? What 
is the intensity of the beam after it goes through the retarder and then through the 
P-filter. What is its intensity after going through the P-filter and then through 
the retarder? 


PROBLEMS 73 


Answer: lo, 


A? cos? 0 + A? sin? 0 + 24,4, cos 4 sin 0 cos 4, 
E cos? 0 + 4i sin? 0 — 24,4, cos 0 sin 0 cos 9, 
db cos? 0 + AB sin? 6 + 2A4,A, cos 0 sin 0 cos ¢. 


3.8 Light in an R-state strikes a 3/4-wave plate and the transmitted light en- 
counters a P-filter followed by a half-wave plate. Determine the intensity and state 
of the final beam. Let both plates have their slow axes in the x-direction and con- 
sider the cases where the line of the P-filter is at +45° to the x-axis. 

Answer: 9, Io. 

3.9 Determine the emergent state in each of the following cases. 

a) A P-state incident on a quarter-wave plate with the line of the P-state mid- 
way between the fast and slow axes of the plate. 

b) A P-state incident on a half-wave plate with the line of the P-state midway 
between the fast and slow axes of the plate. 

c) R- and L-states incident on a quarter-wave plate. 

d) R- and L-states incident on a half-wave plate. 


Answer: R or L, the orthogonal P state, P states, L or R. 


3.10 Show that the transmission matrix for the P-filter of Problem 3.4 expressed 
in the basis | R5, |L> is as given in Appendix II. 


3.11 Show that the matrix for an optically active circular retarder expressed in the 
basis |P;>, |P,> is as given in Appendix II. 


3.12. Show that the matrix operator for a half-wave plate whose fast axis makes an 
angle of 45? with the x-axis is given by 


ED 
—i 0 
3.13 If a general polarization state is given by 
|E> = A;e 19? |P,> + Aye * 9? |P >, 
show that the orthogonal state |E’>, <E’|E> = 0 is given by 
[E> = Aye 1?? |P — Ae? |Py>. 


In the case |E> = |R>, |E = |L> show that the above forms are consistent with 
those given by Eq. (3.9). 


3.14 Following Eq. (3.16), the transformation for a matrix from the basis [Pd 
|P,> to the basis |P.">, |Py> was given as 


M’ = RMR = R'MR. 
Why does this not have the form of Eq. (3.22), 
M’ = UMU"? 


345 Prove that the matrix U whose elements are given by Eq. (3.30) is unitary. 


CHAPTER 4 


THE STOKES PARAMETERS 


Development of another description of polarization states and optical 
systems might well raise the question, Why bother?" One reason we 
proceed to do this is that it will lead to a different insight into polarization 
states. Furthermore, the formalism in the preceding chapter is incapable of 
handling an important class of problems dealing with the incoherent super- 
position of light beams. In other words, how can states of light be described 
which are not pure in relation to any E-filter? What perhaps may be sur- 
prising is that this new formalism—the Stokes vectors and Mueller matrices 
—which is the offspring of that in the preceding chapter—the Jones calculus 
—gives a description of these “‘nonpure states" which its parent is incapable 
of grasping. What is more important, however, is that the description of 
both pure and nonpure states will open a path which can lead to the formula- 
tion and methods of quantum mechanics. 


4.1 THE PROJECTION OPERATOR 


Consider the pure state |E», normalized such that <E|E> = 1, and let us 
construct the 2 x 2-matrix |E><E|, known as the projection operator. 
Clearly, the matrix elements of the projection operator depend on the 
particular ket from which it is formed. To distinguish projection operators, 
we shall write the symbol M(E), where the letter in parentheses refers to the 
ket which determines the matrix elements of a particular projection operator. 
There are three observations about M(E) we wish to make immediately. 
First, it is a Hermitian operator; that is, 


M(E) = |E><E| = [EXE = MME). (4.1) 


Second, if |E’> is any other pure state, then 
M(E)|E’> = |EXE|E') = (E|ED|ED, 
MXE)E' = |EXE|EXE|E') = <E|E'> 
74 


(4.2) 


E». 


4.1 THE PROJECTION OPERATOR 75 


Consequently, operating with M(E) any number of times on a ket yields the 
same result. Any operator such that, as in this case, M&(E) = M(E) is said 
to be idempotent. Finally, each member of the set of all possible polarization 
states corresponds to a definite projection operator. If |E» and |E^? are 
different polarization states, then M(E) + M(£’), and there exists a one-to- 
one correspondence between pure polarization states and projection oper- 
ators. This correspondence is truly one-to-one, however, only if arbitrary 
phase factors are ignored; for if |E’> = e'^|E^, then 


|E><E'| = |EX<E]. 


We wish now to prove a relationship which will be of considerable use in 
the following sections. Assume a pure state |E» and a matrix operator A 
which represents an optical device. We want to prove that 


«E|A|E» = Tr [MCE)A.] (4.3) 


In terms of the familiar matrix operators of the last chapter, this means that 
if A is a matrix representing a retarder, linear polarizer, etc., then A|E> 
generally results in a polarization state different from |E», say |E'». The 
amplitude that the state |E^» will be transmitted through a filter which com- 
pletely transmits the state |E> is given by the trace (Tr) of the product of 
the projection operator and the matrix operator for the device. In order to 
prove Eq. (4.3), we will evaluate directly both members of the equation. 
Let us first expand |E> in terms of a basic set of kets |Ói». Let 


|E> = XE 
<E| = 2 Geo, 


Pe) 
i?» 


(4.4) 


yielding 
«E|A|E» = X <Elbp<dlAldo<oil E>. (4.5) 


ij 


To evaluate the right-hand member of Eq. (4.3), we proceed as follows: 
M(E) = |EX<E| = X EXE) |$0 4. 
5 
Multiplying on the right by the identity * |d,5«d,| and A, 
k 
M(E)A = > «b EXE» Ibo Also. 
ijk 


] (4.6) 
M(E)A — 2 «| EAEE Alho lpk. 


All the bracket products in Eq. (4.6) are scalars, and the coefficient of the 
term |¢;><¢,| is simply the element in the i-th row and k-th column in the 
matrix M(E)A. Since the |d;» are orthonormal, taking the trace of Eq. (4.6) 


76 THE STOKES PARAMETERS 4.2 


involves summing only the diagonal elements of M(E)A for which i = k; 
therefore 


Tr [M(E)A] = Y (EX E|$/5,]A|0o. (4.7) 


and comparison with Eq. (4.5) gives Eq. (4.3). 


42 THE STOKES PARAMETERS 


The ket vectors and 2 x 2-matrix operators which we have been manipulating 
were chosen such that they correspond to experimental data. Evidently, if 
we are to develop a different mathematical representation, it likewise must 
correspond to such data. How can we ensure this? In principle, we simply 
need to find a set of quantities which are in one-to-one correspondence with 
the ket vectors. The projection operators fulfil this requirement and the 
implications of this correspondence will now be examined. 

The projection operator is a 2 x 2-matrix, and can be expressed as a 
linear combination of other 2 x 2-matrices as 


M(E) = 3[P,0, + P,0, + P305 + P305], (4.8) 


where Po and P; (i = 1, 2, 3) are scalars, and the ø; are linearly independent, 
Hermitian, and unitary matrices given by 


amf 1 e-( 1) eshi y -6 y 
(4.9) 


The factor of 1/2 in Eq. (4.8) is included so that later results will conform 
to standard notation. Except for a permutation of indices, these matrices 
are most often referred to as the Pauli spin matrices, which are frequently 
used in atomic and nuclear physics. For the subscript order as given above, 
we shall simply refer to them as sigma matrices. The sigma matrices have 
the following properties, which are easily verified: 


2 " 
o} = 0, Gi = Op, 0,0; = 10;., 
Tr (0;0;) = 2ó;;. 0;0; = —0,0;. 


The subscripts 7, j, k take on the values 1, 2, 3 in cyclic order and the 
Kronecker delta symbol is defined by 


_fi, i=j 
mne 0, i+). 


Since M(E) and the sigma matrices are Hermitian, the scalar coefficients 
of the sigma matrices must be real for 


M(E) = 43 Po, M'(E) = i Y P* 0; (i = 0, 1, 2, 3), 
i i 


42 THE STOKES PARAMETERS 7I 


so we must have, since M(E) is Hermitian, 
> Po,= > P,*0,. 
t v 


Multiplying both members of the preceding equation by 9; and taking the 
trace of both members yields P, = P*. Consequently, when M(E) is ex- 
pressed as a linear combination of the sigma matrices, the projection operator 
is uniquely determined by the four real numbers (Po, P;, P2, Pa) which deter- 
mine the polarization state. When the projection operator M(E) is expressed 
in terms of |P,> and |P,> as a basis, these real parameters are called the 
Stokes parameters and they allow a description of light beams in terms of 
4 x 1-column vectors. 

At this point it is easy to show how the Stokes parameters can be 
calculated for any pure polarization state. Assume the pure state 


|E> = A|P,> + BIP,>; A = ae ??, B = bei??, (4.10) 
which has the projection operator 
_ (A4* AB*\ | 
Mies p A = tory 
By using Eq. (4.3) and the relation Tr (o,0;) = 26,;, we obtain the results 


Tr [M(E)o,] = 1 Tr > 9,0,P; = P, 
; (4.11) 
<E\o,|E> = P;. 

This not only makes the calculation of the Stokes parameters rather 
easy, but, used in conjunction with Eq. (4.3), enables us to obtain a very 
useful relation between the Stokes parameters. By Eqs. (4.3) and (4.8) we 
can write 

|EXE| = } È Po, 
t 


CE|EXCE|E) = $ Y PKE|O\ED, 
1YP, 


Po = Pi + P3 + P3, 


(4.12) 


<E 


oJ ExCE|oy| E> 


which is a result which will form the basis of our later discussion of the 
Poincaré sphere and is satisfied only for pure states. 

The operational significance of these parameters becomes apparent when 
they are calculated directly from the 2 x 2-matrix M(E). By use of Eq. 
(4.10) and Eq. (4.11) the Stokes parameters can be shown to be 


PQo—a xD, Py =a? b, 


] (4.13) 
P, = 2ab cos ¢, P, = 2ab sin $. 


78 THE STOKES PARAMETERS 42 


The intensity of the beam in the state |E> is simply Po, and by adding the 
first two relations above we find that P, = 2a? — Po, where a? is the intensity 
of the light transmitted by a P-filter whose line is along the x-axis. Thus, 
the Stokes parameter, P,, of a beam is twice the intensity transmitted by a 
P.-filter minus the intensity of the beam. If the beam is analyzed by a 
P-filter whose line is oriented at 45° relative to the x-axis, then we find that 


|P|E>|? = 3(Po + P3), 


and P; is twice the intensity of the transmitted beam minus the total intensity. 
Similarly, it can be shown that P; is twice the intensity transmitted by an 
R-filter minus the total intensity. 

Since the Stokes parameters characterize polarization states uniquely, 
it is convenient to write the Stokes parameters as the elements of a column 
matrix (4 x 1) or a row matrix (1. x 4) where the order in which the para- 
meters appear in the matrix will always be maintained as follows: 


Ps , (Po, P1, Po, P3). 


These row and column vectors will hereafter be called Stokes vectors. 
Let us determine the Stokes vectors for some of the familiar polarization 
states. We assume here that <E|E> = P, = 1. Since R- and L-states are 
of the form 


Mp Ls NIS 
|E = Sa (6*1 + et AIP), 


we find 
1 ji 
, 0 0 
IR>= [9 oe og (4.14) 
+1 —1 
A P-state whose line is at an angle 6 with the x-axis has a ket 
|P> = cos 0 |P,> + sin 0 |P,>, 
and its Stokes vector is 
1 
& cos? 0 — sin? 0 
aed 2 cos 0 sin 6 (4.15) 


0 


4.3 THE MUELLER MATRICES 79 


In particular, 


1 1 
pali -l1| (4.16) 
0 0 
and, for 6 = +45°, 
1 
Ies [uf 


4.3 THE MUELLER MATRICES 


It has been observed in the preceding section that with each pure state |E), 
there is associated a four-component vector whose components are the four 
real Stokes parameters of the state. We now face the following problem. 
Let |E> be a polarization state of light before it passes through an optical 
device represented by a 2 x 2-matrix M. The transmitted state will be 
|E = M|E». Ifthe Stokes vector for | E» has components P; (j = 0, 1, 2, 3), 
then the Stokes vector for | E'» will have components P, and there must be 
a 4 x 4-matrix such that 


Pi P, 
BO [P 
p] =|, 
P. P, 


How can a 4 x 4-matrix representation of M be found? 
We proceed by using the properties of the trace of a matrix and the 
fact that if |E = M|E», then |E><E’| = M|E:KE|M'. From Eq. (4.11), 


P; = Tr [o,M|E><E|M‘] 
= } Tr [o;M X o;P,MT] 
j 


= i Tr Y (c;Mo;M?) P; 
j 


where 
Mi; = 4 Tr (c;Mo;M)). (4.17) 


In computing the.#,; a minor but important point must be made. Because 
the subscripts on the sigma matrices run 0, 1, 2, 3, rather than from | through 


80 THE STOKES PARAMETERS 4.4 


4, the matrix elements of M, are not numbered according to the usual 
convention. The element in the first row and first column, for example, is 
Moo not Hy. 

As a simple illustration of the application of the Mueller matrices and 
the Stokes vectors, let us consider the case where right circularly polarized 
light strikes a quarter-wave plate whose fast axis is along the x-axis. In the 
Jones calculus we have 


|E» = M|R> 
MN EAT 
O 0 ent (eren 


i 
= — [-|P» + |P,>] 
al [Po + |P] 


which is a P-state which makes an angle of —45° relative to the fast axis of 
the retarder. From the preceding section and Appendix II, the corresponding 
expression in the Mueller calculus is 


1 0 0 


, 


-1 —1 


0/ \+ 


-ooo 


1 
0 
0 
1 


ooor 


1 
0 
0 


and the components of the Stokes vector for the emergent light are also 
those for a P-state oriented at —45° relative to the fast axis. 

The construction of the Mueller matrix for a sequence of retarders and 
polarizers is the same as that in the Jones calculus. The matrices for the 
individual elements are multiplied from right to left in the order in which 
the beam encounters the elements. A tabulation of the Mueller matrices for 
a number of different elements is given in Appendix H. 


44 PROBABILITY AMPLITUDES IN THE STOKES FORMALISM 


In Section 3.3 it was shown that the calculation of probabilities involved the 
bracket product. In view of the correspondence between kets and the Stokes 
parameters, we expect that this will allow us to calculate probabilities using 
the Stokes vector rather than kets. 

Suppose the |E> and |£’> are normalized so that <E|E> = P, and 
(E'|E'» — 1. At the beginning of the chapter we proved that if A is a 
matrix operator and |E» is a pure state, then, by Eq. 4.3, 


cE|A|E» = Tr [M(E)A]. 


4.4 PROBABILITY AMPLITUDES IN THE STOKES FORMALISM 81 


Now, let A = M(E’) = |E'«E'|. Eq. (4.3) becomes 


&E|ECE'| E» = Tr [M(E)M(E’)], 
«E'|E»|? = 1 Tr (X Po, Y, Pj. 
D q 


The product of the two sums results in 16 terms, 12 of which involve products 
of the form 9,0; (i + j), and since the trace of each of these terms vanishes, 
we have the result 


KEE? = > PP, 


3(1, Pi, P5. P3) (4.18) 
Ps 


Very often the above equation is written more succinctly as 


= (Po 
kg = 109 (p) 


From the normalization of |E» and |E’> the left-hand side of Eq. (4.18) 
means that if a light beam with intensity Pg is in a pure state |E), then the 
intensity of the light transmitted by an E’-filter is |(£’|Z>|?. If the normal- 
ization were such that <E|E> = «E'|E'» = 1, then |<E"|E>|? would be the 
probability that a photon in an E-state would traverse an E’-filter. The 
right-hand side of Eq. (4.18) shows how the transmitted intensity or trans- 
mission probability may be expressed in terms of the scalar product of Stokes 
vectors. 

Instead of analyzing a given state |E> by an E’-filter, let us allow the 
state |E> to first pass through an optical device represented by the 2 x 2- 
matrix T and then analyze the emergent light by an E"-filter. The probability 
in this case is 


|<E'|T|E>|?, 


and an explicit derivation of what this corresponds to in Stokes vector 
formalism is given below: 


KE'|T|E>|? = «E'|T|EXCE[T | E» 
= Tr [[E'<E'|T|EX<E|T*]. 
We may expand the projection operators in terms of Stokes parameters 


|KE|TE>|? = Tr [3 X Pjo;TÀ È Po T), 


82 THE STOKES PARAMETERS 4.5 


and, since the trace of a sum is the sum of the traces, 
|<E'|T|E>|? = $ X Pil Tr o;To T ]P, 
ij 


Po 
P, 
Par 
Ps 


= (Po, Pi, P5, PJM (4.19) 


where, by Eq. (4.17),-@ is the Mueller matrix corresponding to the Jones 
matrix T. 


4.5 THE POINCARE SPHERE 


In Section 4.2 it was indicated how the Stokes parameters can be obtained 
for pure states. However, there is one relation between the Stokes parameters 
characterizing a pure state which has not yet been exploited. If we take the 
entire ensemble of pure states, each of which satisfies the normalization 
condition <E|E> = 1, then, according to Eq. (4.12), every such state satisfies 
the relation 


1 = PP PP PB, 


which should remind us of the equation for a sphere of unit radius. This 
sphere, called the Poincaré sphere, is defined by a set of coordinate axes 
P4, Pz, and P. Since each pure polarization state has a unique set of Stokes 
parameters, this implies that each point on the surface of the sphere corres- 
ponds to a definite pure polarization state. We shall call the vector from 
the origin to the point P;, P2, Ps the Poincaré vector of the polarization 
state. 

If a pure state |E» can be represented by a point on the Poincaré sphere, 
what point on the sphere represents the pure state orthogonal to |E»? We 
will now show that mutually orthogonal pure states correspond to antipodal 
points on the sphere (i.e., opposite ends of a diameter). For any state 


|E> = ae 1? ?|P,» + bet? P,5, (4.20) 
where <E|E> = 1, the corresponding orthogonal state is 
|E = be-'* P,» — aet? Py), 
and, using Eq. (4.13), we see that 
Pj—-—PR, (i=1,2,3)} 


and that mutually orthogonal states are represented by antipodal points on 
the sphere. As particular examples we cite the special cases whose Stokes 
vectors have been evaluated: |R> and |L>, |P,> and |P,>, and the P-states 


4.5 THE POINCARÉ SPHERE 83 


B-P 


Fig. 4.1. The Poincaré sphere. 


whose lines are at +45° relative to the x-axis. These states lie at opposite 
ends of the coordinate axis of the sphere, as shown in Fig. 4.1. It now 
remains to locate a general elliptical state on the sphere in terms of a, b, 
and 9 in the expansion given by Eq. (4.20). 

From Eq. (4.13) we have 


a —b?-—-P, P,|P, = tan $, 


and we see that the coefficients a and b determine a plane in the Poincaré 
sphere which is perpendicular to the P,-axis and intersects the sphere in a 
circle. The point on the sphere which represents | E» must lie somewhere on 
this circle. Likewise, the phase angle ¢ must determine a plane which contains 
the P,-axis. This plane makes an angle ¢ with the P,P,-plane and intersects 
the sphere in a great circle. The point on the sphere which represents the 
state |E» is thus the intersection of these two circles. 

The Poincaré sphere has utility in visualizing the outcome of sending a 
polarized beam through an optical device. For example, if a beam in a 
state given by Eq. (4.20) traverses a retarder plate whose fast and slow axes 
are along the x- and y-axes, respectively, the effect of the plate is to produce 
the emergent state 


|E> = ae- KO * 2| p + bertone p». 


84 THE STOKES PARAMETERS 4.5 


Fig. 4.2. Location of a general P-state on the equator of the Poincaré sphere. 


where ô = kt(n, — n,). The effect of the retarder is simply to increase the 
phase angle. In terms of the Poincaré sphere, this is seen to be a rotation 
about the P,-axis through an angle 6. If the retarder is a quarter-wave plate, 
then light in the state |P,> + |P,> traversing the retarder will emerge as an 
R-state, which is obtained from the Poincaré sphere by rotation about the 
P,-axis through the angle 6 = 7/2. Note that in our convention all points 
on the upper hemisphere correspond to right elliptical states; those on the 
bottom hemisphere, to left elliptical states. 

We have obtained the Poincaré sphere description of polarization states 
by expressing all such states as the superposition of the orthogonal states 
|P,>, |Py>. However, nowhere did we say exactly which axes were being 
used for x and y. We did not give an absolute orientation for the x- and 
y-axes in space. Thus, the Poincaré sphere description is valid so long as 
the P,-axis passes through those points on the sphere which we have chosen 
to be |P,;> and |P,>. Since ¢ is zero for all P-states, such states must lie on 
the great circle in the P,P,-plane. If we wish to express our states in terms 
of axes x’, y' rotated through angle @ from the original axes x, y, we need 
only rotate the P,P,-axis about the P;-direction through an angle 20 to find 
the appropriate new P,P;-axis on the Poincaré sphere. Let us prove this. 

In Fig. 4.2 we show the equator of the Poincaré sphere with a general 
P-state indicated on the equator. In real space assume the line of this P-state 
is at angle 0 with the x-axis. Thus, 


|P> = cos 0|P,> + sin 6|P,>. 
The Stokes parameters for the state are thus 


Py 1, P, = cos? 6 — sin? 0, 
P, = 2 cos Ü sin 0 = sin 26, P = 0. 


4.5 THE POINCARÉ SPHERE 85 


Fig. 4.3. Representation on the Poincaré sphere of the transmission of the state 
|E> through an anisotropic retarder whose fast axis makes an angle 0 with the x-axis, 
and which introduces a retardation ô. 


The tangent of y is given by 
tan y = P/P, = (sin 20)/(cos 20) = tan 20. 
Thus, 
y = 20. 


We can now easily see how a retarder plate whose axes are not along the 
original x- and y-axes affect the location of states on the Poincaré sphere. 
Let the retarder impose a phase difference 6 and have its fast axis along x’ 
making an angle 0 with x. We find the axis in the P,P;-plane which makes 
an angle 20 with P,. Wherever the point for the incident state happens to 
be on the sphere, the new state is found by rotating the old point through 
an angle 6 about the line P; in the plane. This is illustrated in Fig. 4.3. The 
state |E> is transformed into the state |E'». For example, if an R-state 
passes through a quarter-wave plate whose fast axis makes an angle of 67.5? 
with the x-axis, the emergent state is seen to be a P-state whose line is at 
22.5? relative to the x-axis. 

As another example of visualizing the effect of an optical device on 
polarization states, consider optically active materials. Placing such a device 
in a beam rotates the plane of P-states and in particular the states |P,> and 


86 THE STOKES PARAMETERS 4.5 


Fig. 4.4. Location on the Poincaré sphere of the state |E> when expressed in the 
basis |P,>, |Py>. 


|P,>. Itis readily seen that on the Poincaré sphere this amounts to a rotation 
about the P;-axis. Thus, if a given slab of optically active material rotates 
|P,> through an angle 0 with the x-axis, then when an arbitrary state | E> 
passes through the slab, the new state can be found on the Poincaré sphere 
by rotating the old state through an angle 20 about P;. 

Consider now a series of retarders and slabs of optically active materials. 
Each of these in order produces a rotation on the Poincaré sphere. Although 
it is not easy analytically to find the total rotation, any sequence of rotations 
about various axes is equivalent to a single rotation about some single axis. 
In Poincaré space, then, each pure polarization state corresponds to a point 
on the sphere and every nonabsorbing device corresponds to a rotation about 
some axis. 

A simple proof is now easily given which was promised in Chapter 3. 
Let the P,-axis pass through the Poincaré sphere at the states |P,> and 
|P,>. Let |P,> and |P,> be orthonormal P-states such that |P,> makes an 
angle 0 with |P,>, as shown in Fig. 4.4. Let the arbitrary state | E? be repre- 
sented as 


E> = ae~'9/?|P_ + bet |P s. 
T V 


Then, $ is obtained from the Poincaré sphere as shown in Fig. 4.4. It is 
obvious that for every |E;-state, axes x’ and y' exist such that $ = 7/2. 
Since the distance from the center of the sphere to point P is a? — b?, it is 
also clear that axes x" and y" exist such that a = b. Finally, the angle in 
real space between the axes x’ and x" must be 45°. 


4.6 THE SIGMA MATRICES AS INSTRUMENT OPERATORS 87 


4.6 THE SIGMA MATRICES AS INSTRUMENT OPERATORS 


There are a number of ways of grasping the physical significance of the 
sigma matrices which are intimately related. For the present we will concern 
ourselves with the interpretation of the sigma matrices in terms of actual 
physical devices. Since all of our devices are represented by 2 x 2-matrix 
operators which can be expressed as a linear combination of the sigma 
matrices, as in Eq. (4.8), the sigma matrices themselves should represent 
optical devices. The basic devices at our disposal are the linear and circular 
retarders, P-filters, and optically active materials. The linear and circular 
retarder matrices will give us access to the physical interpretation of the 
sigma matrices. For any pure state which is expressed as a coherent super- 
position of the states |P,> and |P,>, the matrix representation for a circular 
retarder, according to Appendix II, is 


R= 'cosf[2 —sin ma 
z laa B/2 cos B/2 


where f/2 is the angle through which a P-state is rotated by an optically 
active material. When the angle of rotation is such that 8 = 7, then 


AEG 
R= (4, o) = tis 


The only difference between R|E» and o3|E> is the phase factor 7; con- 
sequently, @, is an operator which represents a circular retarder that rotates a 
P-state through 90°. In terms of the Poincaré sphere, 95 represents a rotation 
about the P,-axis through an angle f = z, which implies that the only states 
unchanged by the instrument operator @ are the states |R> and |L>. 

In the case of the linear retarder whose fast axis is along the x-axis, we 
have, from Eq. (3.14). 


(em i6/2 0 
M= ( 0 uis 
where, for a half-wave plate, ô = m. Assume that a half-wave plate is rotated 
so that its fast axis is oriented at an angle of +45° relative to the x-axis. 
The matrix representation of the rotated half-wave plate for states expressed 
as a coherent superposition of the states |P,> and |P,> is obtained from 


Eq. (3.17): 
j , (0 1 A 
M' = —i [ 5) = —i05, 


and —i9, is seen to represent a half-wave plate whose fast axis makes an 
angle of +45° with the x-axis and is pure in relation to the state |P,> + |Py>. 
It is easy to verify that (—io,)* = ið, represents a half-wave plate whose 


88 THE STOKES PARAMETERS 4.7 


slow axis makes an angle of +45° with the x-axis; the operation of complex 
conjugation simply interchanges the fast and slow axes. 

The matrix M’ representing a half-wave plate whose slow axis is along 
the x-axis is 


ies 0) ' 
M =i P al = 104, 
and io, is seen to represent such a retarder which leaves unchanged the 
states |P,> and |P,>. 
The multiplication relation for the sigma matrices 


0,0; = iO; (i,j,k = 1, 2, 3) 
in cyclic order now has an operational significance. For example, 
(io )(— i02) = i0; 


means that the effect on light in a pure state intercepted by the retarders 
— i9, and ið, in succession is equivalent to the effect when the light passes 
through an optically active retarder, i0. 


4.7 PARTIAL POLARIZATION 


In our discussion so far, emphasis has been placed on the concept of pure 
states, and this provided the basis for the creation of the Stokes vector space. 
Since the projection operators were constructed from the kets which specify 
pure polarization states, it was assumed that each such state corresponded 
to a definite projection operator and to a definite Stokes vector. In Section 
4.1 the statement was made that there exists a one-to-one correspondence 
between pure polarization states and projection operators. However, this is 
true only if we are talking about pure states. For every pure state there 
exists a corresponding Stokes vector, but the converse proposition, that for 
every Stokes vector there exists a pure state, is not true. In other words, in 
busily constructing the Stokes vector formalism, we have added something 
which is not a member of the set of pure polarization states! 

An example of a Stokes vector which does not correspond to any pure 
polarization state is one in which all the Stokes parameters except P, are 
zero. It is important to bear in mind that the Stokes parameters can be 
obtained by measurement, as outlined in Section 4.2. Furthermore, we know 
that for pure states, by Eq. (4.12), 


1 = P? + PP 


Any beam of light which does not satisfy Eq. (4.12) is said to be in a nonpure, 
or mixed, state. Specifically, a beam of light which has the measured Stokes 
parameters (7, 0, 0, 0), where J is the intensity of the beam, is said to be 


4.7 PARTIAL POLARIZATION 89 


* S 


* EN ai 
S 


M 
F, 


Fig. 4.5. The incoherent superposition of two light beams. 


unpolarized. Since the Stokes parameters necessarily involve measurement, 
it is reasonable to describe the condition of any light beam in terms of a 
degree of polarization, Z^, as 


Po 


Any beam whose degree of polarization is less than 1 but greater than 0 is 
said to be partially polarized. Note that the degree of polarization does not 
depend on the particular orientation of the x- and y-axes. Rotating these 
axes will alter the direction of the P,- and P;-axes on the Poincaré sphere, 
but not change the intensity of the beam, Po, or the length of the Poincaré 
vector, V P2 + P2 + P2. 

The problem remains as to how these partially polarized states may be 
encompassed in terms of Stokes vectors. This problem will be examined more 
thoroughly in Chapter 7, but some aspects of it are pertinent here. 

Consider the situation shown in Fig. 4.5. The sources S, and S; are 
two ordinary independent sources; for example, light bulbs. If one wished, 
they could be made nearly monochromatic by the addition of filters, although 
this would not affect what is to follow. F, and F, are any two polarizing 
filters. M is a half-silvered mirror. At the mirror we are thus combining 
two beams each of which will be partially polarized. We may now block each 
beam in turn and measure the Stokes parameters of the emergent beams in 
each case, which we call P! and P?. Now, with neither beam blocked, we 
may measure the Stokes parameters of the combined beams, which we will 
call P**?. If this experiment is performed, one finds the experimental result 
to be 


(4.21) 


Pit? piu po (4.22) 


90 THE STOKES PARAMETERS 4.7 


This is not always so. If the two beams which are combined had come from 
the two arms of an interferometer, Eq. (4.22) would not be obeyed. For the 
moment let us assume that the two combining beams can be written as 
ei^|E,» and e'"|E;». We have explicitly written phase factors before these 
states, since in a superposition phase differences are of great importance. 
Now write the combined beam as 


|E> = e| EI» + e"| E. (4.23) 
The matrix M(£) is then given by 
M(E) = M(E,) + M(E)) + e^ ^E >< Bo] + e 97 9 EE]. 
Using Eq. (4.11) and the relation 
Tr (A + B) = Tr (A) + Tr (B), 


we see that the Stokes parameters of the superposition would be the sum of 
the Stokes parameters of the individual states if the terms |£,><£| and 
|E2><E,| could be ignored. 

In Section 4.2 the interpretation of the Stokes parameters in terms of 
intensity measurements was discussed. Since an intensity measurement must 
be carried out over a finite length of time, in practice we are interested in the 
time average of M(E), which we indicate by M(£). The experimental fact 
that the Stokes parameters of beams from independent sources add implies 
therefore that 


g- a g-i-wv = Q, 


Thus, between independent sources there is a high instability in relative 
phase. For independent sources, a superposition in which the relative phase 
averages to zero and for which Eq. (4.22) is valid is called a completely 
incoherent superposition. 

Since the Stokes parameters add for incoherent superposition, it is easy 
to see that a partially polarized beam may be represented as the incoherent 
sum of a pure polarization state and a state of zero polarization. Let the 
given state have parameters J, P1, Pa, Po. This vector may be written 


I A h 
P [A 0 
p,}—\P,] > oP 
Ps P; 0 


where J, is given by 
I, = VP} + P} + P 
and 
h=I-— h. 


4.7 PARTIAL POLARIZATION 91 


A useful representation of a completely unpolarized beam is that of an 
incoherent superposition of any of two pure orthogonal states. Since 
orthogonal states have antipodal Stokes vectors, we may write 


1 Hp 1/2 

0| [RÀ —P, 

ao} lat le, 

0 Ps —P; 
with 

1/2 = VP? + P2 + P3. 


The relative phase instability which is assumed in incoherent superposition 
casts doubt on the validity of the superposition in Eq. (4.23). It is the beauty 
of the Stokes formalism that we may bypass this equation. 

The path we have taken to arrive at the Stokes vector formulation for 
describing light, in both pure states and mixed states, is based only on the 
use of pure states. This is seen in the mathematical definition of the Stokes 
parameters as the expansion coefficients of a state which is the coherent 
superposition of two pure P-states. Thus, we appear to have reached an 
impasse; for although we can now describe nonpure states by Stokes 
parameters, we cannot define the parameters simply because the state is not 
pure. 

This difficulty is only apparent, however, for we can determine the 
Stokes parameters of a nonpure state in exactly the same way as we must 
ultimately find them for a pure state: namely, by experimental measurement 
where the operational definitions of the Stokes parameters was discussed in 
connexion with Eq. (4.13). 

Now, any beam of light may be considered as an incoherent super- 
position of a totally unpolarized beam and a completely polarized beam in 
some pure state. To determine the state of the polarized component, we 
place a general elliptical filter in the beam and vary its transmission state 
until we obtain maximum transmission. Since any £-filter will pass half of 
an unpolarized beam, the transmitted intensity is the intensity of the polarized 
part of the beam plus half of the unpolarized intensity. The decrease in 
intensity is that half of the unpolarized beam which was absorbed in the 
E-filter. To justify the use of the Mueller matrices or the generalized Jones 
calculus on nonpure beams, we need only consider them as acting on the 
polarized component, leaving the unpolarized part unchanged in the case of 
retarders, and diminishing it by half in the case of E-filters. We must 
remember, however, that the entire emergent beam from any E-filter will be 
completely polarized. 

Finally, it should be remarked that Eqs. (4.18) and (4.19), which were 


92 THE STOKES PARAMETERS 47 


derived in Section 4.4, remain valid for those cases where a partially polarized 
or unpolarized beam is incident on a series of optical devices consisting of 
retarders and polarizers. However, since the kets in the bracket products 
represent pure states, the bracket products in those expressions have no 
meaning when the incident light is either partially polarized or unpolarized. 
Consequently, in these cases we need rewrite only the left-hand side of these 
equations, giving 


W = Kl. Pi, P5, P3) 


W = M1, Pi, P5, P) M 


where the interpretation in terms of transmitted intensities or probabilities 
is the same as before, but these probabilities, W, may not be written as bracket 
products. As a simple example, let us consider again Malus' law. Assume 
that a P,-filter is used to polarize incident unpolarized light of intensity J, 
and that a second P-filter at an orientation @ relative to the polarizer is used 
as an analyzer. Since unpolarized light has the Stokes parameters (J, 0, 0, 0) 
and the Stokes parameters of the light which has traversed the analyzer is 
given by Eq. (4.15), we have 


1 100 I 
W=11,00,08(5 6 0 ollol 
0000 0 
where 
a = cos? 0 — sin? 0; b = 2 sin 0 cos 0 


and the Mueller matrix (Appendix II) has been chosen such that the line of 
the polarizing P-filter is along the x-axis. Carrying through the matrix 
multiplication gives the result 


I 
W = - cos? 0, 
2 


where 7/2 is the intensity of the light transmitted by the polarizer. 


PROBLEMS 93 
PROBLEMS 


4. Obtain the Stokes parameters for the states given in Problems 3.1 and 3.3. 
5 

= 

5 


12 
Eò P= l1, P, =0, Pa = — —, P = — Ż. 
|E2> 0 1 2 13 3 13 


Answer: |E) Py = 1, P; = 6V3/13, P4 = — < P; 


42 Show that the operator |E><E| is not in general idempotent if CE|E» #1. 
How can it be made idempotent? Does the proof of Eq. (4.3) depend on whether 
or not the operator is idempotent? 

4.3 Prove for any two square matrices A and B of equal dimension that Tr 
(AB) = Tr (BA) and, hence, that 


<E|A|E> = Tr (A|EXCE|) = Tr (|E><EIA). 


6 4) 


in terms of the sigma matrices and find the Stokes parameters for this matrix. Are 
the Stokes parameters real? 


Answer: Po = 5, Py = —3, P2 = 5, Ps = —i. 


44 Expand the matrix 


4.5 A matrix is anti-Hermitian if At = —A. Prove that the Stokes parameters 
of an anti-Hermitian matrix are pure imaginary numbers. 


4.6 For any two states E, and E; consider the two operators |Ey><E2| and 
|E2><E,|. If these two operators are expanded in terms of the sigma matrices, 
prove that the Stokes parameters of the one operator are the complex conjugates 
of those for the other. 


4.7 Show that the matrix 
Q 2s {2 
M = cos 5) ø+ isin —-|P:o, 
2 2 


P: o = P,0, + P5605 + P595, 


where 


operating on any Jones vector leaves the length of the associated Poincaré vector 
unchanged if 


] — P? P2+ Pi. 


4.8 Let the matrix of Problem 4.7 operate on any two Jones vectors and produce 
two new Jones vectors. Show that the angle between the Poincaré vectors associated 
with these states is the same for the new states as for the old. 


4.9 Use the results of Problems 4.7 and 4.8 to show that, when the matrix of 
Problem 4.7 operates on all Jones vectors, the effect on the associated Poincaré 


94 THE STOKES PARAMETERS 


vectors is equivalent to a rigid rotation of the Poincaré sphere about some axis. 
Can you specify the direction of this axis? Can you determine the angle of rotation 
about this axis? 

4.10 Repeat Problems 3.8 and 3.9 using the Poincaré sphere to eliminate all 
algebraic calculations. 

4.11 In Section 4.6 the individual sigma matrices were interpreted as instrument 
operators and the physical significance of the multiplication rule 9;o; = io; 
was discussed. Discuss this multiplication rule in terms of rotations of the Poincaré 
sphere. 

442 Hamilton's quaternions are *hypercomplex" numbers with three different 
“complex” parts. A quaternion may be written in terms of four real numbers 
a, b, c, and d and three “imaginary” symbols i, j, and k, as 


Q — a ^ bi 4 cj -F dk. 


The algebra of quaternions is defined by the multiplication table for the symbols 
i, j, and k given below. 


i j k 
i |—l k =j 
j|-k -1 i 


k| j —i -l 


The row of the table determines the first symbol in the product and the column 
the second, e.g., ij = k. Why would you expect that quaternions might be useful 
for describing polarization states ? 


4.13 If the Poincaré vector of a particular polarization state is given, how would 
you geometrically determine the length and direction of the Poincaré vector of this 
beam after it has gone through a P-filter? How would you determine geometrically 
the Poincaré vector of this beam after it has gone through any polarization filter? 


4.14 The Stokes parameters for a state |E> were obtained from 
3 
|EXE| = > P;o;, 
j=0 

where |E> was written as 

|E» = A|P,> + B|P,>. 
If |E> had been written as 

|E» = A|R> + BILD, 


what polarization states would correspond to the Stokes vectors 


1 1 1 

+1 0 0|, 
0 +1 o] 
0 0 +1 


Answer: |R», |L>; Pò x |P; |Pu>, |P2. 


PROBLEMS 95 


4.15 If you were asked to design a series of linear and circular retarders which 
would alter every polarization state that passed through the series except the states 


1 1 
[E> = — Q|P» + 4|P,>), |E2> = — (4 |P> + 3 |Py>), 
v5 V5 


why would you politely refuse? 

4.46 If the Jones matrix for an optical device is Hermitian, M* = M, show that 
the corresponding Mueller matrix, as defined by Eq. (9.17), is symmetric. 

4.17 A piece of optical active material is of the correct thickness to rotate a 


r 1 ; 
|P;>-state into the state vie + |P,>). A |P,>-state passes through this same 
2 


piece of material and then through a linear retarder. It emerges as a |P,>-state 
once again. What fractional wave plate was it and what was the orientation of its 
fast axis? 


Answer: Half-wave, 22.5° to x-axis. 


CHAPTER 5 


THE METHODS OF QUANTUM MECHANICS 


Given a series of polarizers and retarders, we have seen in the previous 
chapters how we may predict the emergent intensity which will be measured. 
Likewise, we know how the Stokes parameters and, hence, the polarization 
state of a beam may be measured. In this chapter we are going to extend 
this theory to measurements of other physical quantities, such as energy and 
linear momentum. 

We shall examine how the probable outcome of a certain type of polariz- 
ation measurement may be phrased within the formalism of bras and kets. 
It will then be shown that it is reasonable to believe that the outcome of any 
measurement can be expressed in the same formalism in terms of a quantity 
commonly known as the "expectation value". This leads naturally to a 
discussion of the significance of commuting operators which in principle allow 
a determination of the constants of “motion” of a system. The material, 
which up to this point is handled mainly within the framework of finite 
dimensional linear vector spaces, is then extended to indicate how continuous 
functions may be fitted within the framework of vector spaces. 

Since the purpose of this chapter is to serve as an introduction to the 
formulation of physics in terms of quantum mechanics, it is appropriate that 
the chapter concludes with one of the very important implications of this 
formulation, known as the uncertainty relations. 


5.1 EIGENSTATES AND EIGENVALUES 
The problem which we are going to examine in the following discussion is a 
very common one in physics, though at first glance it may appear unfamiliar. 
A routine example in differential equations is 
© fo) = Af) 51 
dx M) mh (6.1) 


96 


5.1 EIGENSTATES AND EIGENVALUES 97 


where / is a constant. The solution of this equation answers the question, 
“What function, when operated on by d/dx, gives back the same function 
multiplied by a constant?" We know the answer and in fact Eq. (5.1) may 
be taken as the definition of the exponential function, f(x) — e^. The key 
word we have used above is "operated". There are many ways in which 
we may operate on a function. We may differentiate, integrate, multiply by 
scalars, etc. Any of these operations applied to a function f(x), will in general 
yield a new function g(x). If we let the symbol stand for any admissible 
operation and let (f(x) represent the result of operating on f(x) with C, we 
may pose the following mathematical problem: 


Ox) = Afix). (5.2) 


That is, do there exist functions f(x) which the operation modifies by at most 
a multiplicative constant A? Equations (5.1) and (5.2) are known as eigenvalue 
problems. If solutions, f(x), exist, they are called eigenfunctions and the 
values of 2 are called eigenvalues. Thus, e^" is an eigenfunction of the 
operator d/dx with eigenvalue 2. 

Eigenvalue problems are not limited to operations on functions such as 
f(x). One may easily consider such problems where the operation is matrix 
multiplication. Thus, if M is a given matrix, one can consider 


M|E» = A[E;. (5.3) 


That is, are there vectors which when multiplied by M are modified at most 
by scalar multiplication? If these vectors exist, they are called eigenvectors 
of the matrix M. 

A physical interpretation of Eq. (5.3) is easily given. Since we have 
represented polarizing devices by matrices, this equation asks whether there 
exist polarization states which can pass through a given device with at most 
a modification in intensity or arbitrary phase. This modification is repre- 
sented by the scalar multiplication. If such states exist, they will be called 
eigenstates of the device. As a trivial example: any state is an eigenstate of 
a simple absorber, since it changes only the intensity of the state and not its 
polarization. Eigenvalue problems are of considerable importance but often 
of such complexity that the eigenvalues are not easily accessible. One of the 
advantages of the 2 x 2-matrix representation of optical devices is that the 
eigenvalues and eigenstates for such devices can readily be obtained. 

Since the optical devices we have considered can be represented by 
2 x 2-matrices, let D be a general 2 x 2-matrix for which 


DIE> = 2|E», 


(i 72) (3) =la) T 


98 THE METHODS OF QUANTUM MECHANICS 5.1 


where A is the eigenvalue. Of course, this matrix equation is equivalent to 


the pair of simultaneous linear equations, 
dj, — AA + dB = 0, 
( 11 ) 12 (5.5) 
d31A + (doz — 2)B = 0, 


which possesses a non-trivial solution only if the determinant of the coeffi- 
cients of A and B is zero, i.e., 


(di—23) dhe 

= 0. 5.6 
du (dan — 2) i 
The solution of the quadratic equation in À is 


À = (doo + dii) + V (ds + diy? ve 4(diidoo s d35d51)), 
À = MTr (D) + V(Tr (DP — 4|D]). 


The term (da; + dii) is just Tr D, and (diidso — diods,) is the deter- 
minant of the matrix D which we indicate by |D|. The eigenvalues can now 
be written as 


(5.7) 


2i _ _ 4[D| |. 
22 = 1T, (D) [ S Eme 


If the matrix D is diagonal, it is immediately evident from Eq. (5.6) that the 
eigenvalues are simply the diagonal elements. Once the eigenvalues are 
determined, the eigenstates can be obtained from Eq. (5.5) by substituting 
the eigenvalues and solving for the ratios A/B or B/A. To obtain A and B 
themselves requires a further normalization condition for the eigenvectors. 
As an illustration of this, consider the transmission matrix of a P-filter 
(Problem 3.4): 


ue ( cos? | cosÜsinO 
^ \cos@sin@  sin?0 


Very quickly we find 
Tr (T) = 1, 
[T] = 0, 
which, from Eq. (5.7), gives this eigenvalues 
à=], 
A. = 0. 


5.2 HERMITIAN OPERATORS 99 


Since T is the transmission matrix for a P-filter whose line is oriented at an 
angle 0 relative to the x-axis, the eigenstate of the filter must be the P-state 
|P> = cos 6|P,> + sin 0|P,». Let us check this. According to Eq. (5.5), 
for A, = 1, 


(cos? 0 — 1)A + cos 0 sin 0B = 0, 
cos 0 sin 0A + B(sin? 0 — 1) = 0, 
which are satisfied for 
B/A = sin 0[cos 0, 


and the suspicion that |P» is the correct eigenstate is confirmed. It is easily 
verified that |P’> also corresponds to the eigenvalue A, — 0, where 
(P'|P» =0. It is left as an exercise to show that the eigenvalues and 
eigenstates of instrument operators Ø; are given as shown in the tabulation 
below. 


Operator Eigenvalue Eigenstate 


+1 [P> 

"s —1 |Py> 
: +1 |P,> + |P,> 
a -1 |P,> — |P,> 

+1 [R> 

a = ILS 


This agrees with the discussion of Section 4.6. 


5.2. HERMITIAN OPERATORS 


Because of their unique mathematical properties, linear Hermitian operators 
play an important role in physics. An operator is linear if it obeys the 
equation 


Ofa|Ey> + b|Ez?] = aO|E,> + bO|E>, 


where |£,> and |£,> are any two kets and a and b are any scalars. It is 
recalled that a Hermitian matrix is one which is equal to its complex conjugate 
transpose. The meaning of the Hermitian property for other linear operators 
will be defined as they are encountered. In this section we will prove a 
number of the more important mathematical properties of Hermitian 
operators. The statements and proofs will be specifically given for Hermitian 
matrices but will be valid for Hermitian operators in general. 


100 THE METHODS OF QUANTUM MECHANICS 5.2 


The eigenvalues of a Hermitian matrix are real 
If |E> is an eigenvector of M with eigenvalue 2, 
M|E> = A|E> 

and 

<E|Mt = CE|A*. 
Multiplying by <£| from the left in the first equation and by |E> from the 
right in the second, we obtain 

<E|M|E> = ACE|E», — CE|M'| E» = A*CE|E». 


Now, if M^ = M we must have A* = 4. 


Eigenvectors of a Hermitian matrix belonging to different eigenvalues are 
orthogonal 


Let |E;» and |Ej» be two eigenvectors of the Hermitian matrix M with 
eigenvalues 4; and Aj. Thus, 


M|E?» = AJED, (5.8) 
MIE; = AjEj. (5.9) 
Multiplying these equations from the left by <£;| and <E,| yields 
<E |M| E> = 2E; E>, (5.10) 
&E,M|E) = 4E] Ep. (5.11) 
Taking the complex conjugate of the last equation and remembering that if 


M is Hermitian, M = Mt and 4* = 2 as well as <E,|E>* = <E,|E>, we 
obtain 


(Ej M|E? = AKE E>. (5.12) 
Subtracting Eqs. (5.10) and (5.12), we obtain 
(A, — A)<E|E> = 0. (5.13) 


Thus, if 4, + Àj, CE,| Ej» must be zero and the states are orthogonal. Thus, 
if all of the eigenvalues of M are different, all eigenstates are mutually 
orthogonal. It often happens that different eigenstates of an operator have 
the same eigenvalue. These are known as degenerate states, and in this case 
our proof is inconclusive. One may show, however, that in the degenerate 
case a set of mutually orthogonal eigenkets can be found. Since we will not 
be greatly concerned with degeneracy in the remainder of this text, we will 
not give the general proof. 


52 HERMITIAN OPERATORS 101 


An operator M may be expressed in terms of its eigenstates and eigenvalues 


The proof is best given by demonstrating the result. Let the eigenvectors 
of M be |E; with eigenvalues 44. We construct the matrix 


M' = X EKE. (5.14) 
Assuming that the |E;» are orthonormal, we operate on |Ej» with M’ and 
obtain 
MED = E ALE EE > 
D 

= 2 Ay| Exe ix 

= AE). 
Thus, the eigenstates and eigenvalues of M and M' are identical. It is left 


to the exercises to show that this is sufficient to identify M and M’. 


A necessary and sufficient condition that two operators M and N possess a 
common set of eigenstates is that MN = NM 


Let us assume that a common set of eigenstates exist which we shall label 
|wiA;>, so that 


Mig, A15» = Aylei Àj, 
Nus A> = uius Ap. 


Thus, 

MN|u;, Àj = iui M|ps 45» = mids] Mis Aj 
and 

NM |ui, Àj» = AN |i, a> = Ajtei|a Àj. 


Since u; and A; are scalars, uA; = A,u;, and the order in which M and N 
act is unimportant. Thus, MN = NM. To prove that it is necessary, we 
assume that MN = NM. Let |j,» be an eigenstate of N with eigenvalue u: 


N(M|u; = MN|u ò = u«M|u;»). (5.15) 
If N is nondegenerate, M|;> must be at most a scalar times |u;». Thus, 
Miu = alui), 
and |u;> is also an eigenvector of M. If, on the other hand, N is degenerate 
and there are several eigenstates with eigenvalue w; then Eq. (5.15) implies 
at most that M|u;> is a linear combination of these degenerate states. For 


reasons given before we will not give the more general proof but simply 
state that in the degenerate case a common set of eigenstates can be found. 


102 THE METHODS OF QUANTUM MECHANICS 5.3 


5.3 A CLASS OF EXPERIMENTS 


We are going to examine a class of experiments which, although of relative 
simplicity, can serve as prototypes for any experiment designed to measure 
a single physical quantity. This alone is enough to recommend the investiga- 
tion, but we shall see that one of these experiments is actually concerned with 
the measurement of angular momentum and, hence, of concrete importance. 

Consider any pure polarization state with ket |S> and a given pair of 
orthogonal states, |E> and |E'». The Poincaré vectors of these states are 
shown in Fig. 5.1 and indicated as E, E’, and S. For the given states |E> 
and |E'» a real scalar quantity associated with the state [S> is the projection 
of its Poincaré vector S along the direction of E. Since we are taking all of 
these vectors to be of unit length, this projection is simply $+ E. Although 
this quantity is suggested by our formalism, it is not obvious that it is 
measurable. To show that it can be obtained from experiment, we recall 
that the probability that the state |S> will pass through an £-filter is |<S|E>|?. 
By Eq. (4.18) we have 


RAED Bey nt 
Thus, 
S- E =2\<S|E>?? — 1. (5.16) 


Since the right-hand side is experimentally meaningful, so is the left. In 
fact, Eq. (5.16) is just a redefinition of the probability scale. If IS = |B, 
then the probability of |S> passing through an E-filter is unity, as is $: E. 
On the other hand, if |S> = |E'>, where |E'» is orthogonal to | E», then the 


Fig. 5.1. Projection of the Poincaré vector S on the direction E — E'. 


5.3 A CLASS OF EXPERIMENTS 103 


probability of |S» passing through the E-filter is zero and S: E = —1. 
Therefore, S - E is a measure of the probability of transmission through an 
E-filter with +1 indicating complete transmission and —1 indicating com- 
plete absorption. We can now give operational instructions for the measure- 
ment of S$: E. The state |S> is sent into a resolver, such as that shown in 
Fig. 5.2, which splits it into the states |E» and |E"). Counters C and C' are 
placed in the upper and lower beams. We may arrange that these counters 
also perform a time averaging of their counting rates so that if 


|S> = «E|S»|E» + €E'|S»| E^», (5.17) 


the output of counter C is |<E|S>|? and that of counter C' is |CE'|S5|?. The 
outputs of these counters are then fed into devices which multiply the output 
of C by +1 and the output of C' by —1. Finally, the outputs of the multi- 
pliers are fed into an adder which takes their algebraic sum. Let us now show 
that this final output is simply S- E. Remembering that S is a normalized 
state with 


|KE|S>|? + 


^E. 
E 


S»[? = 1, 
we may follow the experimental procedure algebraically and obtain 
Final output = (+ DKES]? + (—D|<2"|S>|? 
= 2|(S|E>|? — 1. 

But, by Eq. (5.16), this is just $- E. 

We now find another way to write S- E. Consider the matrix M whose 
eigenvectors are just the state vectors of | E» and |£’>, and whose eigenvalues 
are +1 and —1. One might doubt that such a matrix exists if an explicit 


form for it were not so easy to find by using Eq. (5.14). In terms of the bras 
and kets of E and E’ we may write 


M = |EXE| — |EE. (5.18) 


Now, in terms of this matrix we prove that S- E = <S|M|S>. Using the 


forms given in Eqs. (5.17) and (5.18), we obtain 
M|S> = KE|S»|? — KE'|S5|? = S- E. (5.19) 


is 


IE»? 


Fig. 5.2. Operational definition for the measurement of the quantity S'E. 


104 THE METHODS OF QUANTUM MECHANICS 54 


Now, although S- E is physically meaningful and may be measured, the 
projection of one Poincaré vector on another may be a somewhat abstract 
concept. In one particular case, however, the significance of Eq. (5.19) is 
immediate. Consider the operator 


M = hos. (5.20) 
The eigenstates of c; are |R> and |L>, so we have 


|S) = <R|SD|R> + &L|S»|L», (5.21) 


giving 
<S|M|S> = <S|hag|S> = A|CR|S>|? — Al<L|S>|?. 


Since +4 and —/ are the spin angular momenta of R- and L-states, 
<S|hoz|S> is the average angular momentum along the direction of propaga- 
tion which will be measured for the state |S». and ha is the operator corres- 
ponding to the component of angular momentum in the direction of motion. 
Two important points are to be found in this example. First, we have seen 
how in one case a measurement may be related to the projection of one 
vector along another. Second, we have associated with a measurement an 
operator M and have seen that the result of the measurement on a state 
[S> is given by <S|M|S>. This second point is examined in more detail 
below. 


5.4 EXPECTATION VALUES 


In the preceding section we have seen that, when M = /igs, <S|M|S> yields 
the measured average value of the component of angular momentum along 
the direction of motion of the state |S>. For individual photons this com- 
ponent is either +4 or —A, so that for individual counts <S|M|S> predicts 
exactly the outcome of any single count only if |S> is an R- or an L-state. 
However, in the general case there is no better guess of the outcome. Because 
of this, the value given by «S|M|S» is called the expectation value for the 
angular momentum component. 

We wish to generalize this procedure which we have given for a very 
particular measurement. Since we are going to generalize, let us go all the 
way. Assume, therefore, absolutely any physical system and some measurable 
quantity associated with that system—energy, momentum, polarization, etc. 
For simplicity, we will assume here that the measured values of this quantity 
can take on only a finite set of discrete values but we will later remove this 
restriction. Let the possible values resulting from our measurement be the 
set of real numbers À,. If the measurement results in the particular value 
Ay, then the state of the system which yields this value will be called S,, and 
will be represented by a ket symbol |S,>. These are the pure states of our 


5.4 EXPECTATION VALUES 105 


system in respect to the experiment which we are considering. We assume 
that any state can be written as a superposition of these states as 
[$5 = È ails: js (5.22) 
i 
and that the pure states are orthonormal. We now form the operator 
M = 2. AlS olsi. (5.23) 


This operator is a matrix, of course, if we represent our bras and kets by 
row and column matrices, respectively. We do not rule out the possibility 
that other mathematical representations can be found, but these must also 
obey all the formal mathematical machinery we have developed. The 
operator M has two interesting properties. First, its eigenkets are the pure 
states |S,» and its eigenvalues are the measured quantities /,,, 


MIS, = AS, ^, (5.24) 


as we have seen in Section 5.2. The second property of M is displayed in 
Eq. (5.25) 
<S|M|S> = Y 2a*a,, (5.25) 


which is easily verified with the aid of Eqs. (5.22) and (5.23). Following the 
usual interpretation, a;*a; is the probability that the state [S> will be detected 
in the state |S;>. Furthermore, if |S> is detected in | S)», then the measure- 
ment will result in the eigenvalue A;. Thus, Eq. (5.25) represents a weighted 
average of the possible outcomes of an experiment, and the quantity 
<S|M|S> is called the expectation value of M because each eigenvalue A; of 
M is weighted according to the probability of measuring that value. It 
should be clear that, if |S> = |S;, the expectation value is simply the 
eigenvalue 2;. 

If M is looked upon as a resolver, as in Section 3.3, and if an unlimited 
number of physical entities is prepared such that each entity is in the same 
state |S>, then the operational significance of Eq. (5.25) becomes somewhat 
transparent. In Fig. 5.3 M is shown as a resolver, where |S> is the entrant 
state and the D; are counters. It is evident that the eigenvalues A; are dis- 
persed among the output channels, with the result that no complete prediction 


a 
s 


2 
Anlan| 
Me 


Fig. 5.3. A multichannel resolver. 


106 THE METHODS OF QUANTUM MECHANICS 5.5 


can generally be made as to which exit channel any entity in the state |S> 
will use. However, since the number of counts in each detector D; is pro- 
portional to |a;|?, we can obtain as close an approximation to the expectation 
value of M as we please. This means that the experimental value for one 
count will not generally coincide with the expectation value. This leads us 
to define a root mean square deviation from the expectation value. 

We define the ket |y», for a Hermitian matrix M, 


|y» = [M — <M)]|S>, 
where «M» = <S|M|S>, and form the product 
uy» = «S((M — <M>)(M — <M))|S> 
= <S|M?|S> -= 2¢M><S|M|S> + «MP? 
= (M25 — (M32. 


We see that <y|y> is the expectation value of (M — <M>)? for the state | S>. 
This is just the expectation value for the square of the deviation of the 
individual measurements from the average of the individual measurements. 
The quantity <y|y> will be designated (AM)". The root mean square devia- 
tion AM is then 


AM = Vy» = V<S|(M — <M>%)|S>, 
which will vanish only if 
M|S> = <M>|S>, 


i.e., if <M) is an eigenvalue of M corresponding to the eigenstate |S>. 


5.5 THE POSTULATES OF QUANTUM MECHANICS 


Put imprecisely, quantum mechanics postulates that the formalism developed 
in the previous sections can be applied to any measurement. That is, for 
every measurement an operator M exists which yields the expectation value 
of the measurement through <S|M|S>, whose pure states can be represented 
by the eigenkets of M, and whose eigenvalues yield the certain result of the 
measurement using the pure states. 

Let us see why in one way this postulate is trivially true. If we assume a 
complete knowledge of the possible physical states in which a system may 
exist, then we know the values of all measurables which will be obtained in 
any experiment using one of these states. To indicate how the formalism 
might apply, assume some system where some measurable quantity may 
take on values 4; where, just as an example, there are 25 different possible 
values of A;. Twenty-five different orthonormal kets may be constructed 
corresponding to these states by letting all the components of the ket |5;>, 


5.6 THE TIME OPERATOR AND COMMUTATION 107 


corresponding to the i-th state, in a column representation, be zero except 
the i-th one; we let this be unity. This yields 25 different 25-dimensional 
orthonormal ket vectors. The operator associated with the measurement is 
given by Eq. (5.14): 


M = > A|So«S;. 
= 

In this sense the postulate given is trivially true and amounts to a game 
of playing with words. Just as it is trivially true, it is worthless, for the 
pursuit of physics is simply to determine the possible states of systems and 
to determine the values of measurables which we will obtain using these 
systems. 

Let us look at this from another point of view. What would happen if, 
with no recourse to Eq. (5.14), we could by physical intuition, mathematical 
insight, or even a lucky guess, determine a priori the operator M which 
corresponds to a given measurable quantity? Well, first we would try to 
solve the eigenvalue equation 


M|S;> = a|S)>. 


If we could do this, we would have found the pure states |S;» of the system 
and we would have derived the possible values the measurable could take on, 
the A. In principle, then, the main problem of quantum mechanics is the 
determination of the correct operators. We shall find some of these in the 
following section. In practice, however, the main problem is often to solve 
the eigenvalue equation after we have the operator. 

A necessary restriction, which we have not mentioned, is that the 
operators which correspond to measurables must be Hermitian. This is 
to ensure that its eigenvalues will be real numbers corresponding to the 
real numbers resulting from experiment. To summarize, then, quantum 
mechanics assumes 


a) that to every physical system and to each measurable of that system 
there corresponds an Hermitian operator M; 

b) that the states of the system which are pure in respect to an experi- 
ment to measure this quantity are the eigenstates |S; of M and that in the 
state |S;> the measurement will yield the corresponding eigenvalue 4,; and 

c) that any state |S> may be expanded in terms of the pure states |S;» 
and the expectation value if we use |S» will be <S|M|S>. 


5.6 THE TIME OPERATOR AND COMMUTATION 


Until now we have been concerned only with the polarization aspect of light, 
and we have ignored the possibility that the kets which describe a state must 


108 THE METHODS OF QUANTUM MECHANICS 5.6 


in general be functions of position and time. A very simple example is that 
of a P-state traversing a length of optically active material. As the line of 
the P-state rotates, we may think of the polarization state either as a function 
of its position along the material or as a function of time as it traverses the 
material. Furthermore, since the bracket product can be related to intensity, 
it must in general be a function of position and time, since some places in 
the universe are darker than others and the dark places and bright places 
are not fixed in time. 

Remember that, when we write |S, the symbol | > indicates that we are 
talking about a ket and S plays the role of a label which names the state 
and distinguishes it from all other states which differ from it in any way, 
through polarization or now through space or time dependence. Let us now 
use a new label, «, only for polarization and write |S> = |, r, t» to indicate 
difference in polarization and space and time dependence. Use of the 
vector symbol r indicates only that the ket depends on the spatial position 
and is shorthand for |«, x, y, z, t>. 

When we deal with ordinary functions of position, it sometimes happens 
that the different coordinates appear in such a way that a function of x, y, 
and z may be written as a product of functions, each depending on only one 
variable, f(x, y, z) = g(x)h(y)w(z). In the same way, it will sometimes happen 
that the various dependences of a ket, or bra, may be separated from one 
another. If this is the case, we will write 


lant = worn. 


This is not always possible, but it will occur often enough to make the nota- 
tion useful. Note that we have included the space and time dependence by 
the use of scalar functions. We shall see later that this dependence is not 
necessarily of such a simple form. 

For the moment, let us concentrate explicitly only on the time dependence 
and write |S> = |f>, where |S may, of course, depend on other things than f. 
If the state varies from one time, f,, to another time, tọ, we have |$;» = |f), 
[S2> = |t». Now we are going to assume that there exists a time operator 
which is defined by the equation 


[S35 = U(t, 1) S1». 


That is, there is some operator acting on a state at time /, which gives us 
the state at time tz. 

Now, although we may assume that U exists, we have not shown that 
it is possible to find such an operator. We must solve this problem in two 
stages. 

First, if |S1) and |S5» are known, then we obviously know U, for it is 
given by the matrix |S.><S,|, as is easily shown if <S;|S;>> = 1. However, 
if we know only |S,>, we are asking whether this is sufficient to predict the 


5.6 THE TIME OPERATOR AND COMMUTATION 109 


state at a later time. Now, we have gone to considerable trouble to show 
that complete prediction is not always possible, so it would seem that U 
does not always exist. 

Let us phrase the question a little differently. If all the information 
necessary to predict the statistical outcome of an experiment on a state at 
time /, is contained in | S35, is it possible to determine from this the statistical 
outcome of the same experiment at a later time, t3? If this much is not 
possible, then there is very little left of physics. We therefore assume that 
in principle the operator U can be found. 

Let us assume that we have a state at time t, and a time operator which 
tells us how |5;» develops in time. Also, let G be any operator associated 
with an observable quantity whose expectation value is g. Then we have the 
following: 


4$1|G|S;» — gr» 
<S_|G|S2> = <S JUGU] S> 
= gə. 


Now, it sometimes happens that for two particular operators the order 
in which they are applied is of no importance. This is not always true but 
let us assume for the moment that it is true for G and U: 


GU = UG, 
G = U'GU. 
Here we have demanded that U be unitary in order that 
€S3|S5» = «S,|U'U|S;» = €S,|[S15, 


so that the normalization of the state is time independent. 
If the order in which two operators are applied is unimportant, they are 
said to commute. We see then that, if U and G commute, we must have 


<S2|G|S2> = 48,|G|S;». 


In other words, if we measure g at time /,, we get the same expectation 
value that we get at time /;. This is just another way of saying that g is 
constant or that g is a "constant of the motion". 

We then can say that if the operator associated with some observable 
commutes with the time operator U, that observable is a constant. There 
are many operators which do not commute with U, but in the following 
sections we shall show how it is possible to find some which do. 

As an important example, we now find one operator which commutes 
with U. 

In a previous chapter we often found it convenient to perform coordinate 
transformations on various operators and the bra and ket vectors. At that 


110 THE METHODS OF QUANTUM MECHANICS 5.6 


time the states and operators were considered only with respect to polariza- 
tion, but the idea of coordinate transformations is of utmost importance for 
all observable aspects of a physical system. 

Let us review briefly the formalism of such transformations. We recall 
that a spatial transformation can be viewed either as a physical transformation 
of the system or as a mathematical change of coordinates leaving the system 
intact. The translation of a physical system an amount x to the right is 
equivalent to moving the coordinate system that amount to the left. But it 
must be pointed out that this equivalence is true only if the entire system of 
interest is translated. For example, if the earth were translated within the 
sun's gravitational field, the physical state of the earth would change; its 
potential energy would change. If, on the other hand, the entire solar system 
were translated, then, ignoring the effect of the distant stars, we would have 
performed only a coordinate change. Putting this another way: if we move 
the entire universe, there is no way to tell that we did it; we have only 
changed coordinate systems. 

Now, let us perform such a transformation on some total physical 
system. If |S> is the original state and |S>’ is the state after translation, then 
we assume some operator T such that 


|$' = TIS), 


where |S>’ expresses in terms of the new coordinates the information 
expressed by |S> in terms of the old coordinates. Now, not only state vectors 
must transform but also any operator which acts on those states. We have 
seen in previous chapters that if G is any operator which can act on |,» to 
yield |S5», then after the transformation the new operator which acts on 
|S,>’ to yield |S)" is 

G' = TGT.. 


In particular, the time operator U becomes U' after the transformation of 
coordinates where 


U' = TUT’. 


If in the transformation T we have moved all parts of the system of 
interest, the system must develop in time just as it did before. That is, the 
same operator that takes |t;> into |t?» should also take [t;»' into |t. 
Applying U to [$5 should yield the same physical state as applying it to 
[S»'. Within the formalism, however, the correct operator to apply to |S)’ 
is U’. Thus, since it does not matter whether we use U or U', we must have 
U =U’. Thus, 


U = TUTİ, 
TU = TU. 


5.7 AN EXAMPLE OF A TIME OPERATOR 111 


Therefore, by making the assumption that the particular location in space 
of a system does not affect its time development, which is the same as saying 
that space is everywhere the same, we find that the time operator and the 
translation operator commute. 


5.7 AN EXAMPLE OF A TIME OPERATOR 


In order to make the preceding less abstract and also to develop a little more 
mathematics, we shall explicitly calculate the time operator for a particular 
system. 

All the possible pure polarization states which have been considered until 
now were time independent. Such states are frequently called stationary 
states. Consider an E-state obtained from a coherent superposition of 
orthogonal P-states in the interferometer shown in Fig. 2.3, where the R- and 
L-filters in each arm have been replaced by a P-filter in each arm. If the lines 
of the P-filters are at 90° relative to each other, then the interferometer forms 
E-states through a coherent superposition, which can be expressed as 


|E» = A,e7 ^ ?|P,» + Aet? P». (5.26) 


There are several ways in which the state |E» could be forced to vary with 
time. One method would be to vary periodically the light absorbed by the 
absorbers A and A' while maintaining a constant retarder thickness. This 
would have the effect of making the amplitudes 4, and A, functions of time 
without affecting the phase difference, . Another method would be to keep 
the absorbers fixed, thereby maintaining the amplitudes 4, and A, constant 
in time, and to vary the retarder thickness so that the phase factor e*?? 
became a periodic function of time. We will restrict our attention to the 
latter method, to illustrate the implication and consequences for time-varying 
pure states. 

So, then, let us assume that we have a technique for varying the retarder 
thickness so that the phase difference is given by $ = «t, where the radian 
frequency o is constant and ¢ is the time. Also assume that the absorbers 
are not permitted to vary and that the P-filters maintain a constant orienta- 
tion relative to the entire interferometer. This latter assumption implies, 
for example, that the P-filter in an arm of the interferometer which is pure 
in relation to the state |P,> remains pure in relation to |P,>. This ensures 
that while | E» varies with time the basic kets |P;» and |P,» are independent 
of time. 

Substituting ¢ = wt into Eq. (5.26), which represents the state of the 
light emerging from the interferometer, and differentiating the resulting 
equation, we have 


é io ; ; " 
a |E(t)» = |É(t) = — a [4,e PRP — A,e'?"2|P,»]. (5.27) 


112 THE METHODS OF QUANTUM MECHANICS 5.7 


Now, we notice that the term in brackets can be obtained by operating on 
|E(t)> with the sigma matrix e, of Eq. (4.9). 


1 0 A,e 10u2 A,e io 
lo E os) p (45 o) 
= A,e 1?t3 pP» — Aet Py, 


where we are using the representation 


[P5 = (o). Pò = ( 


As a result, Eq. (5.27) can be written as 
G E iw " 
SIED) = — S e| 6(0)>. (5.28) 


The equation above may ring a bell because it has the form of the very simple 
differential equation 


pes 8 5.29 
ET ikx — 0, (5.29) 


which has the solution x = xọe7 **, where x and t are real variables and k 
and x, are real constants. Can a solution of the same form be written for 
Eq. (5.28)? It can, though the exponential solution, if it exists, must involve 
matrix operators rather than simply k or t. 
We must first determine the meaning of an expression such as e°, where 
S is a matrix. For a real variable x we know that 
2 x? x^ 


x 
aL bee Bee ahs ot 


3! Zo 


and the significance of &, where S is a matrix, is the same as the series of 
eè, that is, 


S2 
e=1+St+—4-°-°+—4+°°°, 
2! n 
where the first term in the expansion is a unit matrix. Just as e" can be 


differentiated with respect to a given variable, so, too, can e°. However, the 
derivative of a matrix such as 


involves the derivative of each and every matrix element, i.e., 


SI le a 
S21 S22 


5.7 AN EXAMPLE OF A TIME OPERATOR 113 


where the dot over the matrix elements indicates differentiation with respect 
to a given variable, say d/dt. The series expansion for e* can be differentiated 
term by term, but we must know how to differentiate the product of two 
matrices. Suppose that we have two matrices A and B whose product is C, 
i.e., AB = C. From Appendix I we know that 


Cir = È aybi 
Jj 
and by differentiating both sides of the above equation we have 
Ch = >, (dybir t aibi), 


which implies that 


IfA = B = S, then 


= ŞS — + — S. 
dt dt dt 
In general, a matrix S does not commute with its derivative and, hence, 
dS? dS 
S o paN 
dt dt 


However, the matrices we shall be using are of the form 


S —4"C, 
where C is a matrix of constants. In this case we can write 
dS? dS 
— =2S—, 
dt dt 


and from this it can be shown by induction that 
dS" dS 
=nSt1—. 
a 


It is left as an exercise to show from the series expansion of eè that 


Since Eqs. (5.28) and (5.29) have the same form, we are strongly tempted to 
write the solution of Eq. (5.28) as 


|E(t)» = eio? EO), 


114 THE METHODS OF QUANTUM MECHANICS 5.7 


and, indeed, it is the solution, as we can verify. This says that the time 
development of |E(/)> is related to |E(0)» through the operator e-!?'^*. 
For this example 


U(t, 0) = e7 $t), 
A good idea of what the solution 
| E(o» =e totes /2) E(0)> 


means physically in this case can be obtained through the Poincaré sphere 
representation. Recall that the Stokes parameters are given by 


P, = Az + Aj, 
P= Ai aE Ai; 
P, = 2A,A, cos ot, 
P3 = 2A,A, sin ot, 


where $ = wt, of course, is the difference in phase between the components. 
Both A, and A, are constant and only P, and P; are functions of time. In 
terms of the Poincaré sphere, the physical situation is shown in Fig. 5.4. 
The endpoint of the vector P is restricted to move in a circle on the sphere. 
Though |E(t)» varies in time, it is a pure state at each and every instant of 
time, since it satisfies the requirement 


P3 = P? + P2 + PR. 


In addition, since P, and P, are periodic functions of time, the vector P 
precesses about the P,-axis at the rate of w rad/sec. It is evident from this 


B 


P 


n 
Fig. 5.4. The effect of the time operator e'^'e:?, 


5.8 THE LINEAR MOMENTUM OPERATOR 115 


Poincaré sphere description that the Stokes parameter P, is a “constant of 
the motion" in the sense described in Section 5.5. If it is a constant of the 
motion, then the operator ©, which corresponds to the observable P, 
should commute with the time operator: 


giotei2 


By using the series expansion for the exponential and remembering that 
oY = 0, if N is even and o¥ = 0, if N is odd, one can obtain 


U= Seis mr 
= cos (F o isin (5 s 


It is obvious that 6, commutes with itself and with oy. As a result, 9 
commutes with U and, hence, the observable, P,, associated with 9g, should 
be a constant of the motion, as we have shown. 


5.8 THE LINEAR MOMENTUM OPERATOR 


We have seen that for every operator which commutes with the time operator 
there is associated a constant of the motion, and we have shown that the 
translation operator commutes with U. We must now find the constant 
associated with T. Rather than consider a general translation, let us examine 
an infinitesimal translation of our system to the right along the x-axis. 
What we mean by a small translation is that the translation operator can be 
written as the identity operator plus another very small term. Thus, we write 
T21I-— -= P 

The presence of the Ax is to ensure that we can make the second part as 
small as we wish. The i//i is included so that we will not have to alter the 
units of the operator P, later. 

If we translate a function an amount Ax to the right, then from Fig. 5.5 
we see that the translated function at x must have the same value as the old 


f(x) Tf(x) 


A EBE———— 


X-Ax 


Fig. 5.5. Spatial translation of the function f(x). 


116 THE METHODS OF QUANTUM MECHANICS 5.8 


function did at x — Ax. If Ax is very small, we can write by the methods 
of elementary calculus 


l 
fe — Ax) = f(x) — Ax — f(x), 
ex 
and we see that we must have 
e 
Ps = —ih eos 
Ox 


If T commutes with U, it is easy to show that P, must also, and we are 
going to find the constant associated with P, rather than T. Whatever it 
may be, we know that it will be an eigenvalue of P, and that the corresponding 
eigenstate of P, will be that pure state which possesses that eigenvalue. To 
find these, we solve 


, 9 
ilis He) = CHO. 


which has a solution 
w(x) = Aei, 

We have seen in Chapter 2 that light traveling along the x-axis has a 
linear momentum of p, per photon and has a periodicity along the x-axis of 
A =hlp,. But e?" has a periodicity of h/C. Thus, we identify the eigen- 
values of P, with the x-component of linear momentum, p,, and P, as a 
linear momentum operator. We now have 

w(x) = Aer”, 


In a similar fashion the operator for the other components of linear mo- 
mentum are 


e 
= —ih—, P, = —iħ —. 
” Oy É ez 
These may be combined to yield the vector linear momentum operator 
€ [7 [^ 
P = —ih V =-—ih (^, +e. te. ) 
ex cy Cz, 
where V is the gradient operator and e,, e,, and e, are unit vectors along the 
three axes. The eigenstate for the operator P may be seen to be 
i 


w(x, y, Z) = A exp 5 (px + pay + pez ] 


and represents a photon whose momentum is not along any particular axis 
but is given by 
P= piez + Puey + pies 


5.9 THE TIME DEPENDENCE 117 


5.99 THE TIME DEPENDENCE 


It is obvious that any operator commutes with itself and in particular that 
U commutes with U. We can therefore ask what constant of the motion is 
associated with the time operator? We could proceed by analogy with the 
previous section, but let us take a somewhat different approach. If we 
consider the development of a system over an infinitesimal time interval, we 
may write 


ð 
It AD = |t + = DAt. 
et! 


In terms of the time operator, 
|t + At = UU + At, Dit. 
Writing 
[A 
Utt -- At D) e 1 SH, 
i 


we find 


ó 
Hzih-. 
ot 
We then set up the eigenvalue equation for the time dependence of |S> and 
obtain 


ih = d = Ci). 


d(t) = Ae” ict ih 


Now, any phenomenon which is periodic in space at a given instant of 
time must also be periodic in time at a given point if it travels past that point 
with some constant velocity, c. In elementary courses it is shown that the 
relation between the space periodicity, 4, and the time periodicity, 7, is 
given by 2 = vc. Since we must have 4 = h/|p|, 


7 =/Al|plc. 
But |p|c = E, the energy of the photon. We thus have 
b(t) = Aen tt^, 


The eigenvalue of H is thus identified as the energy of the photon and 
H as the energy operator. The total space and time dependence of the 
photon state having linear momentum p and energy £ can then be written 


i 


f(r, t) = w(r)d(r) = A exp (;) (pix + py + paz — Et). 


118 THE METHODS OF QUANTUM MECHANICS 5.10 


It is often convenient to define the wave vector k = (1//i)p with magnitude 
|k| = 2z/A and the symbol o by E = fiw. Using these symbols, we may 
write 


fir, t) = Agit. 


5.10 FUNCTIONS AS VECTORS 


The functions which we have obtained as eigenstates of the momentum and 
energy operators differ from those states which we have previously considered 
in that the eigenvalues form a continuous range rather than take on only 
discrete values. Also, the operators which we have found are differential 
operators rather than matrices. We now want to see how all of this can be 
fitted into the formalism developed for the manipulation of bras and kets. 
What we are going to show is that in many respects the concept of a function 
and that of a vector are similar. We are going to find ways in which a function 
may be interpreted as a vector. The most obvious way to do this is by 
examining the concept of an integral of two functions (which may be complex) 
f*(x) and g(x). 

Now, it is well known that the definite integral of the product of these 
two functions may be defined in terms of a sum in the following manner: 


N 
lim 


b 
[ax frogo) = Axio D PAG 
Ja No at 
Consider the scalar product (bracket product) of the vectors 
SiS = > A,* B«ei]eo. 
where 
[S1> =2 Alen, NY = Ble. 


By comparing this with the expression for the definite integral, we see the 
correspondence between the two expressions. We can interpret the integral 
of a product as a “scalar product" of two vectors. We can interpret a 
function as a vector which has an infinite number of components. Each 
“component” of the function is its value for a certain value of x, and instead 
of labeling the components by the names of coordinates we label them by 
the value of x at which the function takes on this value. This analogy 
between functions and vectors is carried over into the terminology of 
functions. For example, if 


b 
[aroo 


5.11 CONTINUOUS SUPERPOSITION OF STATES 119 


the two functions are said to be orthogonal in the interval (a, b). 
We now see how to interpret a bracket product if the bra and ket are 
functions. If |S1> = f(x) and |S.> = g(x), then 


b 
ess = [acs sto, 


where the range of x from a to b is determined by the range over which the 
physical meaning of |S,> and |S2> is defined. We also see that when a ket 
vector is a function, the bra vector which is the adjoint of the ket is given 
by the complex conjugate of the ket function. 

It is fairly obvious how to interpret something like ¢S,|P,|S2>, where 
P, = —ih(0|0x). We have 


ho h l » fal 
S,|P,|S2> = il dx f*(x) = gx). 


There is only one difficulty left, and that is the meaning of an adjoint 
operator when the operator operates on a function. Let us recall what we 
mean by the adjoint Mt of a matrix M operating on a ket |S>. By definition 
[S>]! = <S], and M' is defined by 

«S|M' = [M|S]'. 


Itis not clear, however, what the meaning would be for a differential operator 
placed to the right of the function on which it is to operate. For vectors and 
matrices we can write 


«S]M'|S» = [M|S>T'|5> 
and thus define the adjoint operator P' for functions by 


b 3 b 
[ arg = f aP. 


If for a matrix operator Mt = M, M is called a Hermitian operator. If it 
happens that 


b b 
[axtPritg = [ dx f*Pg, 


then P is said to be Hermitian. 
It is left to the exercises to show that the linear momentum and the time 
operators which we have found are Hermitian in this sense. 


5.11 CONTINUOUS SUPERPOSITION OF STATES 


We have seen that the function 4e**" ^"? is simultaneously an eigenket (or 
now an eigenfunction) of both the linear momentum operator and the energy 


120 THE METHODS OF QUANTUM MECHANICS 5.11 


operator, and is a pure state with respect to both energy and linear mo- 
mentum measurements. The total ket of the state, of course, should also 
include polarization and be written 


S= Ae®t-2| a> 


but we shall continue to ignore polarization for a while. 

Any state may be written as a superposition of pure states, but until 
now we have dealt only with a finite discrete set of different pure states. 
Now, the set of pure states is infinite, depending on a continuous variable, k. 
The extension should be apparent. We must go from the discrete sum 


|S) = ZA |S) 


to an integration 
[S> = | dk A(k)| S(k)>. (5.30) 


The symbol fdk in Eq. (5.30) has been used as shorthand to indicate a triple 
integration over the three independent components of k—k,, ky, and k,: 


[a | {i L - dit, dk, iik; 


Let us examine this superposition in more detail in the case of one dimension, 
where k — ke,. We have 


Ix, 0) = | dk alk) é=”, 
But w = ck, so 


oo 


1 O 
—— A(k, t) ei** dk, 
xl P Uds 


xD = 


where 
Alk, t) = V2m alk) e7 ct, 
We should like to examine |x, f» mainly in respect to the variables x 


and k, so we could either neglect ¢ or set it equal to some particular value. 
We could then write 


|» =f) = = | E Alk) e"! dk. (5.31) 


Whatever the function A(k) may be, so long as the integral exists and is 
not infinite, we are able to make a new function of the variable x out of it. 


5.11 CONTINUOUS SUPERPOSITION OF STATES 121 


We can pose the following mathematical problem. Is it possible to find two 
different functions A(k) which yield the same function f(x)? The rather 
surprising answer is that we cannot, and the pairing of functions through 
this method is one-to-one. The truly amazing thing, however, is that the 
whole process can be turned inside out; and if we know f(x), the function 
A(k) can be found by writing 


i oa 
Alk) = i f(x) e! dx. (5.32) 


A proof of this is given in Appendix IV. 

Of course, for this to be true there must be certain restrictions on the 
functions f(x) and A(K), but these will almost always be met by the “nice” 
functions with which we will be dealing. The pair of functions f(x) and 
A(k) are said to be a Fourier transform pair. To write a function f(x) as a 
Fourier transform is to decompose it in a sum (integral) of simple exponential 
functions of spatial period 2 = 2z/k. The function A(K) tells us the extent 
to which f(x) is composed of the function e'* with k lying between k and 
k + dk. 

As an example, we are going to work out one particular state which is not 
one of the pure states we have found but a Fourier superposition. Instead 
of depending only on the coordinate x, let us try to find a state which depends 
only on the distance from the origin r. We may imagine the point r to be 
located on the z-axis with no loss of generality. The simplest choice which we 
can make for the function A(K) is that it be a constant. In addition, let us 
impose the restriction that for all states which we will add together the 
different K's differ only in direction and not in magnitude. Thus, the tips 
of all k-vectors will lie on the surface of a sphere of radius k about the origin. 
The proper region of integration in k-space is therefore the surface of this 
sphere. Thus, 


[S» = fAe eos? qs, 
But, from Fig. 5.6, 
ds = 27k? sin 0 d0 dd. 
The integration may be carried out to yield 
20 Ak eikr cos 0 d . 
—ir ó—0 


Tf we reinsert the time dependence and then evaluate the limits, we obtain 


| Qa? Ak 
i 


| 5» cos (kr) e- !*t, 


Actually, this state function will be of very little interest to us. However, 


122 THE METHODS OF QUANTUM MECHANICS 5.11 


ksin0 dọ 


Fig. 5.6. The spherical area element ds = k? sin0 d0 d$. The integration over ¢ 
leaves the element dS = 27k? sin0 d0. 


the cosine can be written as the difference of two exponentials, and part of 
the above function, 


Quy ?Ak 
= ———— e 


IS 
ir 


KEAR, (5.33) 
will be of great importance after we show in what way it can be considered 
a state function. First let us state that there is no simple way to obtain this 
function from a Fourier superposition, as in Eq. (5.30). However, we may 
identify Eq. (5.33) as an acceptable state function in another way. If one 
computes the expectation value for the component of linear momentum 
along any given direction using the state function of Eq. (5.33), one obtains 
zero. This is to be expected, since the state was obtained by adding with 
equal amplitudes states with all possible directions of k. Yet all of these 
k-vectors had the same magnitude. One suspects that this function might 
be an eigenstate, not of the linear momentum operator but of the operator 
for the magnitude of linear momentum, whose square would be 


|P]? = P- P = (—ihV) - (—iiV) 
= p», 


If one writes out the Laplacian 


5.12 SPATIAL LOCALIZATION 123 


it is simple to show that the state function of Eq. (5.33) satisfies 
—IPV?|S> = f?k?|S^». 


Thus, this is an acceptable state function which would be associated with a 
source radiating photons of fixed linear momentum |/ik| uniformly in all 
directions, i.e., a monochromatic point source. 


5.12 SPATIAL LOCALIZATION 


With the results of the preceding sections we have reached a point where we 
may explicitly write the state function as a function of space and time, and 
we can interpret the bracket product as an integral. We now examine once 
again the physical interpretation of the bracket product. 

When we were concerned only with polarization states, we interpreted 
<S|S> = 1 to mean that the probability that |S> would pass through any 
filter plus the probability that |S> would be absorbed was unity. Thus, the 
probability that |S^ will be detected in one state or another, or simply 
detected, is unity. It would be natural, when speaking of position, to 
interpret 


<r, alr, xo = Cx|o» f dr f*(r) f(r) 


as the probability that a photon would be found somewhere and in some 
polarization state. Much as was done in Eq. (5.30), we use the symbol dr 
to indicate a volume element often written dV = dx dy dz. The integral 


represents 
P+ 96 4o F 
fa — | | [ dx dy dz. 


With the above interpretation we would set 
<r, x|r, x» = 1. 


This is not in general possible. For example, if the state in question is a 
pure state of energy and momentum, we have 


4S|S» = «a|a) fdr e-'*- 90 g*ier-90 — laja) fdr. 


If the state is defined over all space, then the integral is to be taken over all 
space and is infinite. It is thus impossible to interpret these particular state 
functions as functions of a single photon. They can at best be interpreted 
as functions for a beam of photons which fills all space uniformly and, hence, 
carries an infinite energy. 

If |S> cannot be normalized to unity but represents a beam, f*f dr will 
be related to the intensity of the beam in the volume element dr. If the state 
can be normalized, f*f dr will be taken as the probability of finding the photon 


124 THE METHODS OF QUANTUM MECHANICS 5.12 


within the element dr. Although we have indicated the reasonableness of 
this interpretation, there are difficulties which must be pointed out. First of 
all, finding a photon at some point in space means that the photon must 
interact with matter to make its presence known. However, the interaction 
between photons and matter is an electromagnetic one. It is determined, 
therefore, by the distribution of charges and currents throughout all of space. 
These distributions have so far been completely neglected. However, it is 
still reasonable that the probability that an interaction will take place should 
be large in those regions where f*f dr is large. 

We now examine the implications of making f*fdr large in a given 
region of space and very small elsewhere, that is, localizing the photon. 
The one-dimensional case will illustrate the situation. We ignore polariza- 
tion, and as we are still interested in the state function at some particular 
instant of time, we continue to write 


l eo 
x) ==> dk A(k)e'*". 
fi) =F [ak Aw) 

As an example of a localized state function we consider the case illustrated 
in Fig. 5.7. The probability density f*f has the constant value 1/w in the 
region —w/2 <x < w/2 and is zero elsewhere. It is thus normalized to 
<S|S> = 1. By Eq. (5.32) we may solve for A(k): 

1 eo 
Alk) = —— xe" dx 
e V2n is E fe) 


1 wj? 1 . 
Í =a e dy 


z VIr —w[2 Vw 
_ Vw sin wk/2 
«OM. wki ` 


Let Ak be a measure of the region over which the function A(k) is large. 
A plot of this function is shown in Fig. 5.8. A reasonable measure of Ak 
would be the distance between the first zeros of the function or Ak = 2(27/w). 
Since the region over which the state function f(x) is large is Ax = w, we 
have Ax Ak = 4r. The implication is that if we try to localize the state 
function by making Ax small, we must make Ak large. What is the physical 
significance of this? 

In the Fourier superposition of states each pure state e'** is a pure state 
of linear momentum p, = kh. By superimposing these states we obtain a 
nonpure state whose linear momentum is not well defined but has a dispersion 
Ap, = Ā Ak. Thus, in the example given an attempt to localize the function 
f(x), and, hence, the photon, to a region Ax leads to a dispersion in the linear 
momentum. Likewise, attempting to limit the linear momentum dispersion 


5.12 SPATIAL LOCALIZATION 125 


f (x) 


lyw 


—w[2 w/2 x 


Fig. 5.7. A simple case of a localized state function. 


must result in a large uncertainty Ax for the location of the photon. The 
pure state of linear momentum, e'*, is completely unlocalized and has an 
infinite dispersion Ax. 

The result we have obtained is not due to the particular state function 
we have chosen to examine. We proceed to examine the relation between 
the dispersions in position and momentum measurements for any state. 
Let the x-dependence of the state function be written as any function 
|S> — f(x) The probability that this state will be detected between x and 


A(k) 


— —vw[2n 


-2miw 


N 
a 


Fig. 5.8. The function A(k) which is the Fourier transform of the function f(x) 
shown in Fig. 5.7. 


126 THE METHODS OF QUANTUM MECHANICS 5.12 


x + dx is f*(x)f(x) dx. The expectation value for the x-location of the 
photon is obtained by weighting each value of x by the probability of 
measuring that value and is given by 


Go = fdx f*(x) x f(x). 


We thus identify the variable x itself as the operator associated with the 
measurement of the x-coordinate at which the photon will be detected. In 
bracket notation the expectation value of x becomes 


<x> = <S|x|S>, 


where the bracket product must be interpreted as an integral over the 
coordinate x. If many similar states are prepared, the dispersion in the 
values of <x> which will be measured is as given in Section 5.4: 


(Ax)? = <S|x? — Go?2|S». (5.34) 


Measurement of the linear momentum of the same state |S> will result 
in an expectation value for this momentum given by 


<S|p.|S> = Cp». 


where p, = —ih(é/éx). If many measurements are made, the dispersion in 
the values obtained will be 
(Ap? = <S|pz — <p2>”|S>. (5.35) 


In Section 5.14 it will be shown that these definitions of the dispersions 
Ax and Ap, lead to Heisenberg’s uncertainty relation: 


Ax Ap, > Mi. (5.36) 


The other coordinates and linear momenta satisfy similar relationships: 


An example commonly used to illustrate these results is the diffraction 
of a light beam by a slit. As shown in Fig. 5.9, a wide beam is incident from 
the left on a slit of width d. The plane containing the slit is taken along the 
x-axis and perpendicular to the beam direction. The uncertainty in the 
x-coordinate of an incident photon may be taken as the width of the beam. 
If this is very large, we may make the uncertainty, Ap,, of the x-component 
of the incident momentum very small without violating Eq. (5.36). We may 
therefore say that the x-component of momentum is essentially zero and that 
the beam is traveling parallel to the z-axis. Placing the slit in the beam 
localizes the beam to a region in x of the slit width. If this is made small, 
then by Eq. (5.36) the uncertainty in the x-component of linear momentum 


5.13 TEMPORAL LOCALIZATION 127 


x’ 


MN 


Fig. 5.9. Diffraction by a slit. 


must now become large. On the right of the slit the beam should therefore 
spread out. This is indeed what happens. The calculation of this spreading 
is given in the following chapter and it is left to the problems of that chapter 
to show that the spreading is in agreement with Eq. (5.36). 

Equation (5.36) tells us that we may not prepare states which simul- 
taneously have arbitrary small dispersions in position and in linear momenta. 
It is sometimes stated that simultaneous measurements of position and 
momentum cannot result in arbitrarily small uncertainties or dispersions. 
The second form of the statement follows from the first. If the states cannot 
be prepared, the measurements cannot be made. However, the first form is 
often more closely related to experiment. Tn the slit experiment it is assumed 
that after leaving the slit the x-component of linear momentum is not further 
modified. It can therefore be measured by the position of absorption of the 
photon in a photographic emulsion in the x'-plane. This measurement is not 
simultaneous with the position determination which was made by the slit at 
an earlier time. The slit did take an incident state with small dispersion in 
x-momentum and attempt to prepare from it a state which simultaneously 
had small dispersion in the x-position. This cannot be done. 


5.13 TEMPORAL LOCALIZATION 


Let us again consider the full one-dimensional superposition 
1 eo 
$5 = ~= dk A(k)ei**- 90, 
| ? V 20 [. ( ) 


but examine this function at a particular point in space as a function of time. 
Without loss of generality we take this point to be the origin x — 0 and 
obtain 


oo 


1 iot 
SO = VS Í " do -sf (w)e ®t, (5.37) 


128 THE METHODS OF QUANTUM MECHANICS 5.13 


fü) 


- (1/2) + (1/2) d 


Fig. 5.10. State function for a photon localized within the time interval At=r. 


where we have set 
E 
Alw) =- A (5) 
c C, 


Now, the probability that the photon will be detected between time 1 
and time ¢ + dt is given by f *(0)f(t) dt. We may attempt to localize a photon 
in time by using a state function, as shown in Fig. 5.10. The function takes 
on the value 1/V/7 for —(z/2) < t < 7/2 and is zero at other times. The 
probability that the photon will be detected at the origin within the time 
interval At = 7 around ¢ = 0 is unity. As usual, we may use the inverse 
transform and obtain 2 (%): 


1 E 
(o) = VS p S(t) ef? dt 


Lope od. 
Í — e't dt 


i Vn —7/2 MT 
7 sin wr|2 
TN 2m 2 ` 


The function -% (w) is shown in Fig. 5.11. 

In order to obtain a state function localized in time, we have been forced 
to superimpose many pure states, e^, The resulting dispersion Aw in w 
may be taken as the interval between the first zeros of .o/(o» and is given by 
Aw = 2(27/7). We obtain for the product of the dispersions At Aw = 4r. 
Just as an uncertainty in k corresponds to an uncertainty in linear momentum, 


5.13 TEMPORAL LOCALIZATION 129 


a(@) 


{t/t 


t2 ——— 


nit 


| 

| 

| 
-2nít 


Fig. 5.11. The function (w), the Fourier transform of f(t), as shown in Fig. 5.10. 


an uncertainty in o corresponds to an uncertainty in energy AE = fi Ao, so 
that At AE = 4rñ. In the following section we shall show that, regardless 
of what time dependence is chosen for the state function, the dispersions 
must obey 


At AE > lh. (5.38) 
In the general case one defines the dispersion At by 
(Aff = <S P — 45?|S5, 
where 


<t> = <S|t 


S». 
The dispersion AE is defined, using the energy operator E = if(@/ét), by 


hoo 


(AEP = <S|—7 = — CES», 
with 
CE» = <S\ih - SY 
>= Sli FI 2 


The interpretations of Eqs. (5.36) and (5.38) differ in some respects. 
Equation (5.36) applies to the preparation of states at a given moment in 
time. Equation (5.38) of its nature is concerned with states considered over 
an explicitly nonzero time interval. In fact, Eq. (5.38) directly relates the 
length of time over which an energy measurement is made to the dispersion 


130 THE METHODS OF QUANTUM MECHANICS 5.13 


of energies which will be measured. One can attempt to confine a light beam 
to a pulse by means of a shutter which is open for the time interval At. 
Following the shutter a detector may be placed which measures the energy 
of photons striking it. The dispersion AE which is measured and the open 
shutter time At obey Eq. (5.38). Note that Af is not the time during which 
the detector registers the absorption of a single photon. This time can be 
very much smaller than the At allowed by Eq. (5.38). 

Equation (5.38) then relates the temporal duration of a state to its 
energetic purity. In principle, a perfectly monoenergetic beam would have 
to exist for an infinite length of time. Because of the smallness of /i, however, 
a fairly monoenergetic beam is not an impossibility. For photon energies in 
the visible region a pulse duration of 1 microsecond (10-9 sec) is compatible 
with a relative energy dispersion, AE/E, of one part in a billion (107°). 

An instructive application of Eq. (5.38) is obtained by allowing the light 
source itself to act as the shutter which determines Ar. By this is meant that 
isolated sources are self-extinguishing and cease to emit when their internal 
energy is consumed. Consider a large group of individual radiators (atoms, 
molecules, etc.) which radiate photons of energy fm, in decaying from an 
excited energy state, E,,, to the ground state, Ey. If these sources are 
independent radiators, the intensity of radiation Z(t) follows an exponential 
decay as shown in Fig. 5.12 and obeys 


Kt) = het. 


The intensity decreases as the population of the higher energy state is depleted 
by radiative transmissions. The decay time, 7, at which the intensity reaches 
1/e of its initial values may be taken both as a measure of the lifetime of the 
higher state and the length of time in which the radiation intensity is appreci- 
able. Thus, the decay itself is equivalent to a shutter which is open for a time 


Kt) 
Jr 


a” 
h T=Ie 


0.3681 -————-————- 


i 
| 
| 
| 
| 
r t 


Fig. 5.12. The exponential decay of intensity from an isolated collection of inde- 
pendent sources. 


5.14 COMMUTATION AND UNCERTAINTY 131 


Fig. 5.13. A two-level energy system which decays by emission of a photon of 
energy ñw. If the lifetime of the excited state is 7, the energy of this state is uncertain 
by AE> fi[2. 


interval of approximately 7. The dispersion in photon energies, AE, should 
obey Eq. (5.38) in the form TAE > 3h. This relation is in agreement with 
much experimental evidence. 

If there is a dispersion AE among the emitted photons, then there must 
be a dispersion of the energies Esx, — Ey = hw as shown in Fig. 5.13. 
Equation (5.38) can be applied to the radiation sources themselves as well 
as to the photons they emit. In fact (along with Eq. 5.34) it appears to apply 
to all physical systems. If one attempts to prepare many similar physical 
systems with the same energy, E, with lifetime Af, the dispersion in energies, 
AE, and the lifetime At must obey Eq. (5.38). 


5.14 COMMUTATION AND UNCERTAINTY 


In this section we derive a general relation of which Eqs. (5.36) and (5.38) 
are special cases. We consider therefore any two Hermitian operators, M 
and N. These operators are associated with certain measurable attributes 
ofa state. If many measurements are performed on similarly prepared states, 
S>, the average values obtained will be the expectation values 


«M5 = <S|M|S>, 
<N> = <S|N|S). 


A measure of the dispersions of the individual values about the average 
values will be taken as being given by 


(AM)? = <S|M? — «M»?|S», 
(AN)? = <S|N2 — <ND4jS. 


132 THE METHODS OF QUANTUM MECHANICS 5.14 


We investigate the extent to which the product of the dispersions AM AN 
may be minimized by the choice of the state |S> upon which the measure- 
ments are made. 

Proceeding formally, we define the two kets 


[y> = (M — <M))|S>, 
Iva» = (N — <N>)|S>. 


The product of the dispersions may then be written 
AM AN = V Op |i? Coa] a 
Now, define the ket |y» by 


1 i 
V» m LÁ V ET 2» 
| V Only» Is V Opg| a» Ia 
which leads to 


Kyily> — Cs viol. 


i 
M es ES. s 
<vly> — 2 E AMAN 


The bracket product (y|y» is real and positive, so we must have 


1 à 
22 AM AN v2? — “pal Pr 


l 
AM AN > 3 | vs» — Qa] Vi? ` 


From the definitions of |v;» and |y2> this becomes 
AM AN > 3|<S|MN — NMIS;]. (5.39) 


where the right-hand member is identically zero if M and N commute. It 
is important to remember that Eq. (5.39) is an inequality and not an equality. 
For certain states |S> the product of the dispersions may be very large com- 
pared to the right-hand side of Eq. (5.39). However, it is impossible to 
prepare states for which the product of the dispersions will be smaller than 
the value given above. The application of Eq. (5.39) to the cases of Sections 
5.13 and 5.14 is easy. If M represents the position operator x and N repre- 
sents the linear momentum operator p, = —ih(0/éx), we have 


/ |. 8 a 
éS|x (-^ <) = (-i8 a) x|S> 
Ox ôx 


é é 
= S|—ihx — + ih + ihx — |S> 
ôx Ox 


= <S|ih|S> 
= ih. 


PROBLEMS 133 


Thus, Eq. (5.39) yields Eq. (5.36): 
Ax Ap, > h[2. 


The remaining uncertainty relations follow as easily and their derivation 
from Eq. (5.39) is left to the problems. 

Though Eq. (5.39) may be applied to any two Hermitian operators, it 
should be apparent from the previous sections that the physical interpretation 
of the result must be very carefully examined in each case. 


PROBLEMS 


5.1 Prove that the eigenvalues of an anti-Hermitian operator are imaginary. 


5.2 Prove that the linear momentum operator p, = —ih(é/@x) is Hermitian. 
Assume that all state functions vanish at infinity. 

5.3 Prove that a necessary and sufficient condition that two matrices M and N 
be identical is that <S|M| S> = <S|N| S^ for all |S> and |S’. 

5.4 Show that two matrices are identical if they have a common set of eigen- 
functions and eigenvalues. [Hint: Use the result of Problem 5.2 and expand |S> 
and |S^ in the common eigenfunctions.] 

5.5 Show that the eigenvalues of any projection operator are 0, 1. 


5.6 If two projection operators commute, prove that their product is a projection 
operator. Find a simple example to prove that the converse of this statement is 
not true. 


5.7 Give a counter-example to show that a matrix with real eigenvalues need not 
be Hermitian. 


5.8 Find a simple matrix S whose elements are functions of time for which 
S(dS/dt) 4 (dS/dt)s. 

5.9 Consider the product of two exponential operators e?*;, e^*;, What special 
precaution must be taken if this product is to be written e25:* ^e;? 


5.10 Show that the expectation value of the x-component of linear momentum 
associated with the state |S’> given by Eq. (5.33) is zero. 
5.11 Consider the matrix 
/(cos0  e7 sin 
O4 = , 
d ides 0 —cos 9, 

where 0 and ¢ are the usual angles defined for spherical coordinates (see Fig. 6.2). 
Is this matrix Hermitian? Is it unitary? Prove that the eigenvectors of this matrix 
are 


[ 
cos 5 —sinze- @ 
| 2 sin;e 
[S> = " ob [$25 = 6 3 
i in- - 
e sins COs 5 


with eigenvalues +1 and — 1. 


134 THE METHODS OF QUANTUM MECHANICS 


5.12 Compare the matrix o,, with the form of the matrix operator given in 
Problem 4.7. What are the Stokes parameters associated with oo? What are the 
components of a vector whose direction in real space are given by the angles 6 
and ¢ in spherical coordinates? 

Answer: 0,2 cos 0, 2 cos ¢ cos 0, 2 sin ¢ cos 0. 
5.13 The matrix So, defined by So; = (4/2) o, may be taken as being the operator 
associated with the component of spin angular momentum along the ¢-direction 
of a nonrelativistic electron. If the electron is in state |$;» of Problem 5.11, the 
expectation value of the angular momentum in the 0 4-direction is +h/2 with proba- 
bility +1. If the electron is in state |555, this projection is —/i/2. 

a) Assume that the system (the electron) is in the state described by 


>-() 


What is the expectation value of the operator S**? If S^? is considered as a 
resolver, what is the probability of measuring eigenvalue +4/2? 

b) For a system in the state |S> of (a) for what particular values of 0 and ¢ is 
the angular momentum certainly +//2? Why is it reasonable to associate 
the state |S> with an electron whose spin angular momentum is along the 
z-direction? 

Answer: (a) h/2 cos 0, cos? 0/2. (b) 0 = 0. 

5.14 Redraw Fig. 5.2, adding new boxes as necessary so that the final output is 
still S- E when the output of the counters C and C’ is the total number of counts 
rather than a time average, as supposed in Section 5.3. 


5.15 A beam of light encounters two identical half-wave plates in succession. The 
first plate is fixed and the second plate rotates about the beam axis in such a way 
that the angle ¢ between the fast axes of the two plates is given by $ = ot. If 
|0> is the state of the emergent beam at 1 = 0 and |t» the state at time t, find the 
time operator U(t, 0) such that |f = U(t, 0) |0>. [Hint: |0> is identical with the 
original state before the plates. Which of the sigma matrices commutes with U(r, 0)? 
What is the constant of the motion associated with this sigma matrix. Interpret 
U(t, 0) in terms of the Poincaré sphere.] 

cos 2wf — sin A 

» 95 


Answer: F 
sin 2mf cos 2wt 


5.16 An experiment is performed to measure the probability that a P-state whose 
line is at angle 0 to the x-axis will pass through a P,-filter. If P, is the matrix for the 
filter, the expectation value of the experiment is given by Malus' law, <P|P,|P> = 
cos? 0. Show that the dispersion AP, is given by AP, = cos ô sin 9. 

5.17 It is impossible to prepare a beam which is simultaneously in two different 
P-states, whose lines are not parallel. Taking the matrices M and N in Eq. (5.39) 
to be the matrices for two P-filters, P and P’, compute APAP’, using Eq. (5.39). 
Show that if the correct states |S> are chosen, APAP” becomes zero. Compute the 
product of the dispersions, using the results of Problem 5.16, and obtain the same 
result. Why does this result not imply that a beam may simultaneously be in two 
different P-states? Consider these results in the light of the closing remarks of this 
chapter. 

5.18 Use Eq. (5.39) to obtain Eq. (5.38). 


CHAPTER 6 


DIFFRACTION AND IMAGE FORMATION 


Realizing that the state function will generally be a function of position, it is 
apparent that this function will also depend on the location and geometry 
of various objects which exist within the field. In this chapter we shall 
investigate the effect of such objects on the state function. We assume an 
incident state given within some region, described by a state function 
|S>ine.. We now modify the incident state by adding one or more objects 
to the region (or removing them from it). Of course, the state function will 
change. Call the new state the modified state, |S>moa.. The basic diffraction 
problem is the determination of |S>moa. when |S>jn¢, is known and when the 
nature of the added objects is specified. In general, this is an exceedingly 
difficult problem. In fact, the mathematical specification in any rigorous 
way of these added objects is often difficult if not impossible. We shall, 
therefore, examine cases of relative simplicity for which |S>ino, and |S>moa. 
are assumed to be scalar functions of position and for which polarization 
effects may be neglected. 

The results obtained will be applied to the extremely practical problem 
of image formation by optical systems. We shall see that the formation of 
an image may be viewed as a double diffraction process. This viewpoint 
yields a powerful control on the quality and type of image formed for a 
given object. 


6.1 GREEN'S THEOREM METHOD 


In the abstract theory of vector functions, where no physical interpretation 
is implied, it is possible to prove certain theorems relating volume integrals 
to integrals over surfaces. One of these theorems, known as Green's theorem, 
is given below: 


[dr (pV76 — EV) = f(yV$ — $Vwy) - ndA. (6.1) 
135 


136 DIFFRACTION AND IMAGE FORMATION 6.1 


In this equation $ and v are arbitrary scalar functions of position, except 
that they must possess first and second partial derivatives with respect to all 
spatial coordinates so that the Laplacian and gradient operators are meaning- 
ful. They must also take on only such values as will leave the integrals 
finite. In short, the physicist would say that they must be “nice” functions. 

Since this theorem holds for any two scalar functions, it certainly holds 
for any two scalar functions which happen to be photon state functions. 

We now assume that the functions $ and y are state functions which 
are known to obey 


Vitr) = — kf), (6.2) 
or are monochromatic Fourier superpositions of the form 
l ikr 
fr) = One i dk A(k) e'^*, (6.3) 


where the magnitude of k is constant independent of the direction of k. It 
is left as an exercise to show that the function f(r) as defined by Eq. (6.3) 
satisfies Eq. (6.2) when the magnitude of k is constant. 

Making these assumptions, we easily see that the left-hand side of Eq. 
(6.1) vanishes and we obtain 


IVe — Vy): ndA — 0. (6.4) 


Before proceeding, it might be well to indicate where we are going. 
By restricting the state functions we shall consider, to those which obey 
Eq. (6.2), we have greatly narrowed this class of function. To see how much 
we have limited this class we examine the following question. Assume that 
a function is defined within a finite volume and that we are given the value 
of this function and its gradient over the surface which bounds the volume. 
What do we know about the function inside the boundary? In general, 
nothing! But, if we also demand that the function obey Eq. (6.2), we shall 
see that we know everything about it inside the boundary. The importance 
of this result for the diffraction problem is that we may replace the search 
for |S>moa, over a volume of space by a search for this function over a surface 
bounding that volume. 

Continuing, we consider a point in space surrounded by some surface. 
We divide this surface into two parts. The first is a small sphere surrounding 
the point at which we wish to compute the state function. If this sphere is 
small enough, the average of the function over this surface will equal the 
value of the function at the center of the sphere. The geometry is shown in 
Fig. 6.1. Here A and A’ make up together a surface bounding the volume. 
A’ is the spherical surface surrounding the point of interest located at ro. 
An arbitrary point within the volume or on either A or A’ is located by rı 


6.1 GREEN'S THEOREM METHOD 137 


Fig. 6.1. The geometry and notation used in the derivation of Eq. (6.5). 


with ri = rg + ro. Choose the function y in Green's theorem to be the state 
function which exists in this region and choose the function ¢ to be 


¢ iin, 1 ew, 


where r = [rj]. Splitting the integral over the total surface into two integrals, 
one over A and one over A’, we have 


E 
I. DIE en) — = elt vel -ndA 


L4 F 
+ [v V E en) eme er vy | : n' dA' — 0. 
A’ r r 
Since the second integral is over a small sphere, we write the gradient 
operator in spherical coordinates: 


V [2 18 1 [2 
uu REI RETI 
The coordinates and the unit vectors are defined in standard fashion, as 
shown in Fig. 6.2. Taking the dot product with the inward-directed normal 
n', we replace both operations by —é@/ér. Also, assuming that the sphere A’ 
can be made vanishingly small and that y does not vary appreciably over 
its surface, we replace y by its value at the center, yo. We also replace 


138 DIFFRACTION AND IMAGE FORMATION 6.2 


Fig. 6.2. Spherical coordinates. 


Oy[ór by its maximum value over the surface, certainly without decreasing 
its contribution to the integral. The second integral then becomes 


1 fe ; 


In the limit as kr approaches zero, e'*' may be replaced by unity. Since r 
is constant over the sphere, the entire integrand is constant. The area of 
the sphere being 477?, we obtain for the integral 


P 
Any) — ik4ary + (2) 4ar. 


or 
As r approaches zero, only the first term remains. Finally, 
1 1 1 
p= — — yV E eltr| — — eikr vel n dA. (6.5) 
47 Ja r r 


Thus, if y and Vy are known over the surface which bounds the volume V, 
we may compute y; at any point in V. 


6.2 APERTURES IN PLANE SCREENS 


In many cases of practical interest the propagation of light is limited by the 
presence of a plane screen whose transmission and absorption properties are 
functions of position on the screen. Tn this section Eq. (6.5) will be adapted 
to this case, beginning with the situation in which various portions of the 
screen either totally transmit or totally absorb. We are thus concerned with 
an opaque screen with one or more apertures cut into it. 

In order to make the problem more specific, assume that the incident 
state is the spherically symmetric state |S54,;, = Afr, e'*" arising from a 


6.2 APERTURES IN PLANE SCREENS 139 


Fig. 6.3. Geometry and notation used for the study of scalar diffraction by apertures 
in plane screens. 


point source, P, located to the left of the screen. In Fig. 6.3 the surface of 
integration, A, is broken into two parts, S and S”. S” is that portion of a 
spherical surface of radius A, centered on the observation point P', which 
lies to the right of the screen. The portion S is that portion of the screen 
intersected by S’, plus the aperture. In order to use Eq. (6.5), we now 
require a knowledge of the state function over this surface. Since we do not 
know the effect of the screen on the incident function, we do not possess 
this knowledge. We will assume that the total function over the aperture is 
simply the incident one and that any differences near the edge of the aperture 
or elsewhere can be neglected. We thus expect that the results derived may 
not be valid near the aperture, and this is so. In fact, near the aperture the 
equations to be derived do not yield the assumed function upon which they 
are based. However, those values obtained for points far from the aperture 
prove to be in very good agreement with experiment. 

In addition to knowing y over S, we must also know it over S’. There 
are two methods of handling this part of the problem. Both allow the radius 
of the sphere, R, to become infinite. The first technique assumes that the 
disturbance at P’ due to the field on a section of S’ decreases as 1/R?. How- 
ever, since the surface area of S’ increases as R?, one must take some care 
in showing that these two competing influences do lead to a vanishingly small 
contribution from S'. For a proof of this the student is referred to the 
bibliography.* 


* Stone, John M., Radiation and Optics, McGraw-Hill (1963). 


140 DIFFRACTION AND IMAGE FORMATION 6.2 


The second method is more physical in flavor. In any real situation the 
light has not been incident for an infinite length of time. Hence, it has been 
able to propagate to the right of the screen only a finite distance. We may 
simply place the surface S' past this distance and the disturbance will be 
identically zero over S'. This method has difficulties also. Any wave which 
has not existed for all time cannot be written as a simple monochromatic 
Fourier superposition. It must be composed of a set of simple monochro- 
matic functions with a finite range of k-values. However, if the incident 
wave does start at some instant in time, we may assume that after a sufficiently 
long time any transient behavior will disappear and the function at P' will 
approach the monochromatic case. Since we can place 5' as far away as we 
wish, we can make this approximation as good as we wish. Thus, over the 
aperture we assume y = A[r;e'* and over the remainder of both surfaces 
we assume y — 0. Equation (6.5) thus becomes 


1 ikr, ikra g A : 
Vo — [4 -1y ( ) zi es V Ze] “nds. 


4r aperture ry Fa Uu ry 


Since the exponential functions are spherically symmetric about P and P', 


we may write 
einn e eii 1 ein 
V — ÉD ( ) = ER (x e >) 5 


From Fig. 6.3 it is seen that 
€,,' n = COS 0, e, ` n = cos d. 
We may then write 
1 Aet: +r2) 1 1 
xix pu EE o- (ik - 2) H 
Vo zl rx l(i = cos i E cos ó 


If we are concerned with visible light, k ~ 10» cm", and 1/r will usually 
not be larger than 10? cm~*. Thus, we ignore the 1/r;- and l/r;-terms and 
obtain the Kirchoff-Fresnel formula for the diffraction of a monochromatic 
scalar wave: 


[cos 0 — cos ¢]. 


ik Aet +12) 

-— dS 

ye 4m | rile 

The term involving the cosines of 0 and ¢ is known as the obliquity factor. 
Even though we have made a number of approximations in obtaining 

this result, the integrals which arise are usually of great difficulty. Although 

it will by no means make the integrals trivial, there is one final approximation 


6.3 FRESNEL DIFFRACTION 141 


which we make. Usually, P and P' are far enough from the screen and close 
enough to the axis, passing through the aperture, normal to the screen, to 
enable us to approximate: 


cos ġ ~ —1, cos Ü c 1. 


Finally, we obtain 


ik Aek +12) 
: fas E i: 


s (6.6) 


Wo = 
SUP 


6.3 FRESNEL DIFFRACTION 


In the following sections we will consider the case of Fraunhofer diffraction 
in which both P and P' will essentially be placed an infinite distance from 
the screen. In this section we will apply Eq. (6.6), as it stands, with both the 
source point and the observation point at some finite distances from the 
screen. However, we are assuming that these distances are large enough to 
make the obliquity factor constant and equal to 2. In this approximation 
the possible variation in r, and rz is so small that their product in the denomi- 
nator may be considered constant and taken through the integral sign. 
Since k may be extremely large (105 cm^?), even a small variation in r, or 
r can produce a significant change in the exponent k(r, + rz). Thus, the 
exponential cannot be considered constant. 

In the spirit of a rather long example, we will consider now a rectangular 
aperture. We will take this aperture to lie in the £j-plane and the source 
point, P, to be on the z-axis. The observation point will have coordinates 
x and y. The geometry is as shown in Fig. 6.4. Writing out r, and rg we 
have 


rm rh e Pre, 
ra= rh + (E =)? + 0 — 91^. 
Assuming that the aperture is small compared to F1 and rao, these expressions 
may be approximated, using A1 + x c1 + a[2, which is valid for «<1: 
rı C rio + (E + 777)/2r10, 
ra C rgo + (6 — x)? + (1 — y)?]/2r20. 


Thus, the sum in the exponent becomes 


1 (x? + y? 1/1 1 
nm tot tn S RA e 5 +) 


2: F20 2\r r. 
10 20 (6.7) 
21/1 1 éx — "y 
TU xU 
2 hao feo F20 F20 


142 DIFFRACTION AND IMAGE FORMATION 6.3 


Fig. 6.4. The special case of a rectangular aperture in a plane screen. 


If we complete the square for both quadratic terms in é and 7, we obtain 


G^ t y) 
2(rao + rio) 


Tio + fo [i FioX ) ( Tioy )] 
+ é + See š 6.8 
2F1o20 Tio + F20 i: Fio + F20 € 


Equation (6.6) may now be written 


ie ik A’ L dé exp a + F20) (: = F'ioX )] 


2m r'io'2o Jé 2riofao Tio + r20 


x ju dij exp |= + r20) ( cu _ "oy y| 
n 2rislao » Tio + F20 : 


rı + r2 = lao + F2 + 


with 
A’ = A exp {ik[rio + roo + (x? + y?)/2(r20 + r10). 
If we change the variables of integration to 


E=, yf ay —- 
=f , =n— > 
rio. + F20 Tio + F20 


6.3 FRESNEL DIFFRACTION 143 


Fig. 6.5. The transformation from the variable £ to the variable £ is equivalent to a 
transformation which sets the x-coordinate of point P' equal to zero. 


then the integrals have the same form as would be obtained by setting x and 
y equal to zero but with the limits of integration changed to 


Boe Fio% , 
12^ 91277 
Tio + F20 
Toy 
/ 10} 
Tirs T Ma = i 
io + "20 


From Fig. 6.5 it is evident that this transformation is equivalent to rotating 
the z-axis to join P and P’, ignoring the fact that the aperture plane is then 
not perfectly normal to the z-axis. An alternative point of view is that the 
diffraction pattern as a whole shifts in the manner that one would expect 
from simple geometric optics, when the source or aperture is shifted. 
Since this is the case, we may confine our attention to the case where x and y 
are zero. Finally, to get the integrals into standard form we make the 
substitutions 


a= EV (ro + rao) Àr1orao, 


APER ee MECRES (6.9) 
p= NV Xr;o + F20)/Ar 10r 20> 
and obtain 
iA’ aa em i Is e a € 
= — —— e 2 e 2 dp. i 
Th (rio + F20) Joa $ By d ( ) 


144 DIFFRACTION AND IMAGE FORMATION 6.4 


6.4 THE CORNU SPIRAL 


The integrals in Eq. (6.10) cannot be evaluated in terms of elementary 
functions but must be computed by numerical methods. Since the imaginary 
exponential can be written in terms of the sine and cosine, this evaluation 
is usually done in terms of the Fresnel integrals, 


u 2 
C(u) — Í cos zE dt, 
0 2 


i " (6.11) 
S(u) — Í sin — dt. 
hj 2 
In terms of these integrals Eq. (6.10) becomes 
L 
Mopar 2 Woo [C(a2) — C(x) + iS(a2) — iS(a)] 
p : (6.12 
X [C(B2) — C(f:) + iS) — iS), , 
where 
_ A 
wo rio + F20 


is the value the state function would have at P’ if the screen were completely 
removed, ignoring a phase difference between A and A’. For a numerical 
tabulation of the Fresnel integrals the reader is referred to the literature.* 
It is possible to present the contents of such a tabulation in graphical form, 
as is done in Fig. 6.6, which represents a complex space in which the real 
axis takes on the values of C(u) and the imaginary axis those of S(u). The 
resulting curve, which is known as Cornu’s spiral, is obtained by considering 
uasa parameter. For any value of u one can immediately read off the values 
of C(u) and S(u). We now wish to show how one may use the Cornu spiral 
to obtain a diffraction pattern by graphical means. If we think of 


x = f 
Í eint 12 dt 
0 


as a summation of infinitesimal complex vectors e^? dt, then the complex 
number which is the value of this definite integral is simply the vector drawn 
from the origin to the point on the spiral labeled by æ. Likewise, since 


«3 «X2 91 
xum 
va 0 0 


the integral from o; to «5 is the complex vector drawn from the point labeled 
by « to that labeled by æa. This is illustrated in Fig. 6.7. In addition, the 


* Jahnke, E., and J. Emde, Tables of Functions, Dover Publications, New York, 
(1945). 


07h 


0.6 F ad 
0.5 - 
0.4 F 
03r 
02r 
— 


C(u) 


i 1 | | l Ni l fi ee 1 L l fe es L 
08 07 06 05 04 03 02 01 0 01-02 03 04 05 06 0.7 0.8 


Fig. 6.6. The Cornu spiral. 


Fig. 6.7. Addition of complex vectors on the Cornu spiral. 


146 DIFFRACTION AND IMAGE FORMATION 6.4 


length of the complex vector e/7^? dt is simply dt. Thus, the path length 
along the spiral from the origin to the point labeled by « is just «. Since 
these vectors are a representation for complex numbers, it is important to 
remember that the length of the vector gives the magnitude of the complex 
number and that its phase angle is given by the angle between the vector 
and the real axis, which is indicated by ¢ in Fig. 6.7. All of this holds equally 
well for the evaluation of the integral in 6. After the appropriate values of 
« and fj have been obtained from Eq. (6.9), it is only necessary to obtain the 
related two complex vectors and multiply them together. The product of 
two such vectors is, of course, a vector whose length is the product of the 
lengths and whose phase angle is the sum of the phase angles. 

In the following sections we are going to examine several special cases 
which will appear to violate the assumptions under which Eq. (6.10) was 
derived. Before proceeding we wish to discuss the extent to which Eq. (6.10) 
is really limited by these assumptions. In purely geometric optics there is 
no diffraction. Under the assumption of rectilinear propagation any aperture 
will cast a shadow whose boundary marks a discontinuous jump in intensity. 
Scalar diffraction theory replaces this perfectly defined shadow with a con- 
tinuous transition in intensity and an imperfectly defined shadow. Yet from 
everyday experience one knows that the geometric theory cannot be all bad. 
Shadows and boundaries do exist. We are led to expect, therefore, that the 
intensity in a diffraction pattern is determined mainly by some small area in 
the diffracting screen lying near a line joining the source point and the 
observation point, and is little influenced by those parts of the screen which 
are far from this line. In the Cornu spiral it is seen that as « and f become 
large the spiral closes in to the value 4 for both C(u) and S(u), at point B, 
or to the value — 4 at point A. Once « and f are large, making them larger 
has little effect. The meaning of this is that if large values of « and f can be 
obtained from Eq. (6.9) without violating the assumptions which have been 
made, making them larger will violate the assumptions and introduce errors, 
but errors which can be small enough to be ignored. For example, assume 
a small square aperture, with P and P' both 1 meter away from the screen, 
such that the aperture subtends 1^ from P'. We find for visible light, 
å = 5 x 107?cm, that « and f are approximately 50. Thus, as we now 
enlarge the aperture to a size which would violate the assumption of a 
constant obliquity factor, we will not introduce a large error. In fact, in 
certain cases we introduce no error at all. If we remove the screen, 
then the edges of the aperture are at infinity and «s = f; = oo, a = 
f, = — œ. Thus, in Eq. (6.12) the two complex vectors are both (1 + i), 


giving 
Yo = Yoo: 


which is, of course, the exact result. 


6.5 THE STRAIGHT EDGE AND SLIT 147 


6.5 THE STRAIGHT EDGE AND SLIT 


The two remaining cases which we will consider are one-dimensional. That 
is, we will allow the « dimensions of the aperture to become infinite. In this 
case the complex vector associated with « is (1 + i) and Eq. (6.12) becomes 


vo = = yola — Cs) + ISB) — ISPD (613) 


We will return to this expression which yields the diffraction pattern of an 
infinite slit shortly. Now, however, we will also remove one edge of the slit 
to infinity, leaving a single straight edge. If we remove the edge at 7, we 
obtain 


= Yol + D + C(Ba) + iSd. 


Vo = 


If the observation point is very far to the right of the remaining edge, then, 
as in the preceding example, we have a vector from A to B and we obtain 
the full incident intensity. As we move the observation point tc the left, or 
bring the edge at y closer to the axis, the tip of the vector moves around B, 
oscillating in length until it finally passes through the origin of the spiral. 
At this point f; is zero and we are at the location of the geometric shadow. 
Continuing into the shadow, the oscillation ceases and the magnitude of the 
vector decreases monotonically. The measured intensity, which is |yo|?, is 
given in Fig. 6.8, with yog = 1. 

Returning now to Eq. (6.13), we may consider the pattern due to two 
straight edges forming a single slit. As we move the observation point 
across the pattern, we are changing 7; and 7, but the difference 72 — 1j; 
must, of course, be constant. Thus, on the Cornu spiral the length of the 
vector will vary but the distance along the spiral will remain constant. To 
obtain the slit pattern, we first compute f = f, — f, and slide an arc of 


1 2 3 B» 


Fig. 6.8. The diffraction pattern of a straight edge. (After Towne, Wave Phenomena, 
Addison-Wesley, Reading, Mass., 1967.) 


progressively wider slits. (From Towne, Wave Phenomena, Addison-Wesley, 
Reading, Mass., 1967.) 


6.6 FRAUNHOFER DIFFRACTION 149 


this length along the spiral, taking the distance between the ends of the arc 
as the magnitude of the state function. Figure 6.9 gives a series of typical 
single slit diffraction patterns differing only in the width of the slit. The slit 
width increases from top to bottom. Notice that the center of the pattern is 
not necessarily a maximum or a minimum of the pattern but can be either, 
depending on the slit width. 


6.6 FRAUNHOFER DIFFRACTION 


The integrals obtained in the case of Fresnel diffraction posed particularly 
difficult mathematical problems because of the quadratic dependence of the 
exponent on the variables £ and 7. We now show that this dependence can 
be reduced to a linear one by the use of an appropriate lens system. This 
case is known as Fraunhofer diffraction. It can be shown that this is equiva- 
lent to allowing the distances rj and r5; to become infinite. 

We regroup the terms in Eq. (6.6): 


ik "ad ikry ikrg 
ges ; | dS (4 - ) ( ) ; (6.14) 
T Py Fa 


The first term in brackets is the value of the state function at the element dS 
in the aperture. In the preceding section we considered the extreme case, 
where the aperture was totally open and the screen totally absorbing. Let 
us now consider the more general case, where the transmission through the 
aperture can take on any value. This can be accomplished by placing a 
screen over the aperture, whose transmission properties are a function of 
position. It would then be reasonable to include in the integral a function 
which adjusts the amplitude and phase to its new value at each point within 
the aperture. Doing this, we rewrite Eq. (6.14) as 


etki +12) 


w= 7E asus T, (6.15) 
2r [EU 
where the amplitude of the complex function w(£, 7) is a measure of the 
absorption properties of the screen at the point &, 7 and the phase of this 
function determines any retardation of the wave at that point. 

In going from Eq. (6.14) to Eq. (6.15) it must be remembered that Eq. 
(6.14) demands that the screen be illuminated by a single point source. This 
ensures that the phase difference between different portions of the screen is 
fixed or, in the sense of Chapter 4, that the illumination is coherent. This 
will be important for much of what follows. We also point out that Eq. 
(6.15) implies that a knowledge of the state function over the aperture is 
sufficient to determine the diffraction pattern. The Green's theorem method, 
however, assumed a knowledge of both the function and its gradient. It 


150 DIFFRACTION AND IMAGE FORMATION 6.6 


Fig. 6.10. The parameter p for a ray passing through a thin lens. The quantity 
po is the radius of the lens. 


would seem now that this much knowledge is not really necessary. This is 
indeed the case, and the details of obtaining Eq. (6.14) without presupposing 
a knowledge of the gradient of the incident state are left to the problems. 
Within the diffracting aperture we now place a single converging thin 
lens of focal length f, chosen such that the points P and P’ are conjugate, i.e., 


ji 1 1 
LE (6.16) 


fo rio fuo 


We consider this lens as a plate of varying thickness which produces various 
phase shifts in the light it transmits. We must therefore calculate the thickness 
of the lens as a function of the distance from the axis. In Fig. 6.10 the lens 
is shown with its thickness and its radius, po, greatly exaggerated. In an 
actual case pọ is small enough compared to rio and F2 to enable all rays 
passing through the lens to be taken as parallel to the axis. We now calculate 
the thickness of the lens in terms of the radii of curvature of its surfaces. 
This is most easily accomplished by splitting the lens along its median plane 
and calculating the thickness of each half separately. The left-hand half is 
shown in Fig. 6.11, again exaggerated. The thickness, ¢, a distance p from 
the axis is written £ = T — à. If R is the radius of curvature of the spherical 
surface, this may be written 


t = (R — VR? — p) - (R - VR? — p), 


Fig. 6.11. A plano-convex lens with exaggerated thickness. 


6.6 FRAUNHOFER DIFFRACTION 151 
using the Pythagorean theorem. Factoring R from the radicals, we obtain 


maJ 


and if both p and po are small compared to R, we may use the approximation 
V1 ax + («/2), which is valid for « <1 and obtain 


1 
t = — (p? — p). 
5g (Po p?) 


For the original lens whose first and second surfaces have radii R, and Rs 
we thus obtain 


The minus sign arises from the sign conventions introduced in Chapter 1 
for the radii. If the lens is of refractive index zt, the difference in the optical 
path length through the lens a distance p from the axis and the same path in 
air is 


2.9/1 
AL — (n — 1) P 2( -x) 
2 im R, 


or, using Eq. (1.30), 
Po — pl 
Af be t. 6.17 
2l 7 (6.17) 
The phase shift introduced by this thickness is KAL. After insertion of the 
lens in or near the diffracting aperture, Eq. (6.15) becomes 


ikA gii ra AL) 
Vo = ——— fe dn — —— — ulk, n). (6.18) 
2m Fila 


We now approximate r, + rs as in Eq. (6.7), but instead of completing the 
square as in Eq. (6.8) we regroup the terms to obtain 


1 
rı + r2 =ru + [rao + gren yn] 
di (6.19) 
A e EE 1 
*b(S-L)ee»l-leeae. 
F20 


Tio F20 


If r is taken to be the distance from the center of the lens to the point x, y, 
in the diffraction pattern, its magnitude is given by 


rm Vr x? Ty, 


152 DIFFRACTION AND IMAGE FORMATION 6.7 


which in the usual fashion may be approximated by 
1 
r= rta + — (x? + y?). 
2rao 


This is recognized as the first term in brackets in Eq. (6.19). Noting that 
p? = & + n? and using Eqs. (6.16) and (6.17), we are able to write 


2 
1 
ri +r + AL= (ro +£) Freen). 
20 


2f, 
After we drop the constant terms rio + p2/2f in the exponent, Eq. (6.18) 
becomes 

ikA etkr ik (Bay) 


yx, y) = — = | fas dy u(£, n) e "0 


27 riolao 


(6.20) 


where the integration is to be taken over the aperture. If we make the 
change of variables 
k k 
go qum 
F20 F20 
and define u(«, p) to be equal to u(¢, 7) over the aperture and zero elsewhere, 
we obtain 


+0 +2 
wx, y) = — pw I. [ n dx dB u(x, B) e ^*^^. (6.21) 
Thus, the diffraction pattern found in the image plane of the lens, except 
for a multiplicative factor, is the Fourier transform of the distribution 
over the aperture. The significance of this remark, for the theory of image 
formation, is profound, as we shall see. 


6.7 THE RECTANGULAR APERTURE 


We assume a rectangular aperture symmetric about the z-axis with edges 
located so that « and f at the edges take on the values +a and cf. For 
an open aperture u(x, f) is set equal to unity over the aperture and zero 
elsewhere. If all the constants in Eq. (6.21) are grouped into a constant C, 
the integrals become 


M 
v.) 2 C | j Í ° dadh eir enin 
Xo J — Bo 


, Sin ax sin foy 
——c, 


=E 
ox Boy 


where C’ = 408 C. If the intensity of the incident state is adjusted so that 


6.7 THE RECTANGULAR APERTURE 153 


Fig. 6.12. Fraunhofer diffraction pattern of a single slit. (After Rossi, Optics, 
Addison-Wesley, Reading, Mass., 1957.) 


the intensity in the diffraction pattern at x = 0, y = 0 is unity, the intensity 
in the diffraction pattern is then given by 


Ixy) = (= o) (= Pr) (6.22) 


&oX Boy 
A plot of the function (sin «x/ax)? is given in Fig. 6.12. The zeros of this 
function obviously occur for xx = nr, n #0. It is left to the problems to 
show that the locations of the secondary maximums are given by the solutions 
of the transcendental equation 


tan ax = ax. (6.23) 
The first few solutions of this equation are given in Table 6.1. 


Table 6.1 


Maximum and minimum values of the function sin «x,/«x, 
and the associated values of (sin ox, /ax,)? 


n XX sin xx,/«x, | (sin ex,[oxs)? 
1 0.0 1.0 1.0 
2 4.4934 —0.2171 0.0472 
3 7.7253 +0.1284 0.0166 
4 10.9041 —0.0913 0.00834 
5 14.0662 +0.0709 0.00503 
10 29.8116 —0.0335 0.00112 
15 45.5311 +0.0220 0.000484 


154 DIFFRACTION AND IMAGE FORMATION 6.8 


n 


Fig. 6.13. A model for a diffraction grating. Slits of width w are separated by 
opaque regions of width w — d. 


6.8 THE DIFFRACTION GRATING 


A simple model of a transmission diffraction grating consists of alternating 
opaque and transparent strips, as shown in Fig. 6.13. The width of each slit 
is w and the distance between slits is d. Since the problem is basically one- 
dimensional, we ignore the integral over f. If the constant terms are dropped, 
Eq. (6.21) becomes 


v(x) = | da u(a) e- ** 


n=+N pnb+ E 
> Í da e^, 
n 


n=—N -$ 


where each integral in the sum is over one slit with a = kw/roo, b = kd]rao. 
If there are N slits to the right and to the left of the center slit, the total 
number of slits, ^^, is given by./f^ = 2N + 1. Performing the integration, 
we obtain 

sin xa/2 Š 
xa[2 n=—N 


The sum is recognized as a geometric series with ratio e^ ^. Making use of 
the rule for such a sum given below, 


e irnb 


yx) = a 


we obtain 
sin xa/2 sin.A^xb[2 
tB LU 24 
idc xa/2  sinxb[2 (62A 
Which may be normalized such that 


o- (sary (SS 


(6.25) 


6.8 THE DIFFRACTION GRATING 155 


I 


Fig. 6.14. Fraunhofer diffraction pattern of a grating with 4” = 10. (From Rossi, 
Optics, Addison-Wesley, Reading, Mass., 1957.) 


If / is set equal to one, the first portion of this pattern is recognized as the 
Fraunhofer diffraction pattern of a single slit. It is to be compared with the 
x-dependence of the rectangular aperture pattern and is seen to be the 
function already tabulated in Table 6.1. If, on the other hand, ./ is large, 
then the first portion of the function is slowly varying compared to the 
second. For the moment we shall consider that it may be treated as a 
constant. A plot of the second portion of the function is given in Fig. 6.14. 
The large principal maxima occur when the sine functions in both numerator 
and denominator are zero. This occurs when xb/2 = nm. At these points 
the value of the function is M°. It is left as an exercise to show that the 
secondary maxima are located by the solutions of the transcendental equation 


N bx b 
tan 2 —.f tan T (6.26) 
There are 4^ — 1 zeros located between each pair of principal maxima at 


x —2nmÁb (n21,2,3,.. ,n— 1). 


If is very large, the value of xb/2 for the first few secondary maxima 
near the central principal maximum is small and one may approximate, 
with increasing accuracy as .4” increases, 


i bx  . bx bx 

ang — sin, = 5 
If we make this approximation in the denominator of Eqs. (6.24) and (6.25), 
the second portion of the function becomes that tabulated in Table 6.1 
multiplied by the constant ^M. 

For large .4” the secondary maxima are either so close to the principal 
maxima or of such small amplitude as to be indiscernible and the pattern is 
one of sharp peaks. If the slowly varying portion is included, the amplitude 
of these peaks would decrease going away from the central maximum, as 
shown in Fig. 6.15. 


156 DIFFRACTION AND IMAGE FORMATION 6.9 


( sin 5 y 
up 


Fig. 6.15. A grating diffraction pattern for large AV including the single-slit pattern. 


6.9 THE CIRCULAR APERTURE 


We now consider a circular aperture of radius R. In terms of the variables 
« and f, the integration will be over a circle of radius KR[ra;, as is indicated 
in Fig. 6.16. By the symmetry of the problem, if the pattern along the x-axis 
is known, it is known along any other radius in the xy-plane. We therefore 
set y = 0 in Eq. (6.21). The integral may be reduced to an integration over 
a single variable « if the element of integration, 


2 
EDEN — a? da, 
P20 


shown in Fig. 6.16 is used. Once again we ignore the constant in Eq. (6.21) 


and obtain 
kRjr. kR 2 
y(x, 0) = J 2 oduecie JE — g. 
—KR[rao F20 


dA = 284a 


Fig. 6.16. Geometry used in deriving the Fraunhofer diffraction pattern of a circu- 
lar aperture. 


6.9 THE CIRCULAR APERTURE 157 


Making the change of variable 
y = KRx/r29, u = rso[kR, 
we obtain 
KR\2 (tl. "RN. 
p(x, 0) = 2 (=) i ec" V1 — u? du, 
Foo] J-1 


which may be written 
kR\? [5 j=; 
yx, 0) = 4 (=| Í V1 — u? cos (yu) du. 
T20 0 
The last integral cannot be evaluated in terms of elementary functions. It 
can be given in terms of the transcendental function J,(y), which is known 
as Bessel's function of the first kind of order one. In fact, it is possible to 
define this function by 
2y (! ,——À 
Ji(y) = E Í V/1 — u? cos (yu) du. 
T J 
Again dropping multiplicative constants, we may write the intensity in the 


diffraction pattern as 
2J: g 
K(x, 0) = (2) > 
d 


y = kRx]rao. 


(6.27) 


As it happens, the intensity given by Eq. (6.27) is normalized to an intensity 
of unity at the center of the pattern, x — 0. A plot of Eq. (6.27) is given in 
Fig. 6.17. The diffraction pattern of a circular aperture is thus a bright 


1.221 223m 


2 
Fig. 6.17. The function EE . 


158 DIFFRACTION AND IMAGE FORMATION 6.10 


star no. | 

-— 
———5 
tar nme 


Fig. 6.18. Overlap of the diffraction pattern of two stars. 


central spot, known as the Airy disk, surrounded by alternate dark and light 
rings. The first dark ring occurs for y = 1.227. 

If one considered the point source which illuminates the circular aperture 
to be a star, and the lens in the aperture to be the objective lens of a telescope, 
then the image of the star in the focal plane of the objective would be the 
diffraction pattern given in Eq. (6.27) A reasonable criterion, due to 
Rayleigh, for distinguishing two very close stars is that the center of one 
pattern can be no closer to the center of the other than the first dark ring. 
Thus, the angular separation of the stars must be the angular separation of 
the dark ring from the center of the pattern as viewed from the objective. 
This is shown in Fig. 6.18. Since Omin, is small, we may set x/roo = Omin.. 
We thus have 


1.227 = kRx|rao 
or 
Omin. = 1.224[D, 


where D = 2R is the diameter of the telescope objective. Photographs of 
two diffraction patterns satisfying Rayleigh's criterion are given in Fig. 6.19. 


6.10 IMAGE FORMATION IN COHERENT LIGHT 


Consider the image-forming system shown in Fig. 6.20. S is a point source 
which coherently illuminates the aperture of lens L;. The plane P' containing 
the lens L, and the plane containing the source S are taken to be conjugate. 
Planes P and P” are likewise taken to be conjugate for lens Ly. A transparency 
or slide, the object, which may vary in thickness and in absorption, is placed 
next to L, in the object plane, P. According to simple geometric optics, an 
inverted image of this object will be formed in the plane P" by lens L;. The 


6.10 IMAGE FORMATION IN COHERENT LIGHT 159 


rE 


(b) 


Fig. 6.19. (a) Images of two point sources with first dark rings tangent; (b) images 
of two point sources which are just resolved according to the Rayleigh criterion. 
(From Towne, Wave Phenomena, Addison-Wesley, Reading, Mass., 1967.) 


Fig. 6.20. A particular image-forming system. An object placed in the plane of P 
is illuminated by a point source at S and an image is formed in the plane P” by 
lens Ls. 


160 DIFFRACTION AND IMAGE FORMATION 6.10 


magnification will be rao/ro».. We wish to interpret this image formation in 
terms of Eq. (6.20). Taking the coordinates of a point in plane P to be £ 
and n, the function u(£, 7) describes the object. (x, y) is the value of the 
state function at a point in the aperture of L whose coordinates are x and y. 
The distance from the vertex of L, to the point x, y isr. Finally, the function 
d(x', y") is the state function at point x’, y’ in the image plane P"; r’ is the 
distance to this point from the vertex of lens La. 

Equation (6.20) may now be applied to the diffraction process between 
plane P and P’, and is written 


. Ae" 
y(x, y) = F(x, y), (6.28) 
r20 
where 
ik 1 — 
F(x, y) = - = I] d£ dy u(&, n) e7 i20 mo, (6.29) 
2r rio 


Now, to the order of approximation which was used in deriving Eq. (6.20), 
rao =r and the term Ae™"/r2 = Ae'"'[r and represents the state function 
which would be obtained over the aperture of L; if a point source were placed 
at the vertex of Lı. Thus, placing an object described by u(é, n) in the plane 
P with a point source at S yields exactly the same distribution in P' as if we 
placed a point source at S” and an object described by F(x, y) in the plane 
P'. Thought of in this way, the distribution in plane P" may immediately be 
written down, since the planes P and P" are also conjugate. The distribution 
in P" is 


Ae“ i 
dx’, y) = F'(x', y), (6.30) 
F30 
where 
ka 


2m fao 


F(x,y)-—-— Í | dx dy F(x, y) ec ise +9, (6.31) 
The distribution in the image plane P”, F'(x', y’), may now be written in 
terms of the distribution in the object plane, (4, n), by placing Eqs. (6.28) 
and (6.29) in Eqs. (6.30) and (6.31). This yields 
NT k \? A exp (ikr’) 
n » u = — ————— 
Hy) i) Fio 2030 


fi dé dy I ix dy u(é, n) exp i (= =) F , (= 2] 
P A 


where P and . beneath the integrals indicate that the integral over Ẹ and 4 
is to be taken over the entire plane P, whereas that in x and y is to be taken 
only over the aperture of lens Ly. This distinction will be examined later 


6.11 SPATIAL FILTERING IN COHERENT ILLUMINATION 161 


but at this point we assume that the aperture is large enough to contain 
most of the initial diffraction pattern, F(x, y), of the object u(£, y). We 
may then extend the integral to be over the entire xy-plane. Referring to 
Appendix IV, we may write 


t r , 
IA secos epe ee] 
-o F20 F30 F20 F30 


= exa (E mj (enr. 


20 F30 F20 F30 
Placing this in Eq. (6.32) and performing the final integration over £ and 1, 
we obtain 
u r F. ikr’ r: r r i 
Hey) = cA Dew u (— By, — y). (6.33) 
l'iol'ao F30 30 

Thus, the value of the state function in the image at the point x’, y', except 
for multiplicative constants, is the value in the object function at point 


È = —raolraoX', y= —T 2030y". 


The image faithfully reproduces the object, is inverted, and is magnified in 
the ratio rgo/r2o. This is all in agreement with simple geometric optics. 

As has been pointed out, Eq. (6.20) implies that a diffraction pattern in 
the Fraunhofer case is basically the Fourier transform of the state function 
over the diffracting aperture. We have seen in this section that image 
formation in coherent illumination may be viewed as a double diffraction 
process. A diffraction pattern of the object was formed in the aperture of 
lens L, and the diffraction pattern of this pattern gave the image in plane 
P". This is the essence of Abbe's theory of image formation. The detailed 
analysis of other systems of lens will, of course, vary, but in all cases 
some- intermediate plane will be found over which the state function is the 
Fourier transform of the object and the final image is formed by a second 
transformation. 

The practical importance of this viewpoint is that by insertion of various 
filters in the plane of the first transform one may control the manner in which 
the different portions of the first transform recombine to form the final image. 
Some examples of this will be examined in the following section. 


6.11 SPATIAL FILTERING IN COHERENT ILLUMINATION 


As an example of how a final image may be modified by the use of filters 
which alter the Fourier transform of the object, the following simple case is 
considered. As the object we take the diffraction grating discussed in Section 
6.8. This is represented in Fig. 6.21. For simplicity the lenses L, and La 
have not been shown and we have set ro; = Fso. According to Section 6.8, 


162 DIFFRACTION AND IMAGE FORMATION 6.11 


- a |- 


| 
| 
roa dap 


p 
20 m 30 u 


———————À— M a 


Fig. 6.21. Image formation with a diffraction grating as the object. In the figure the 
distances r;o and rao have been set equal. 


the diffraction pattern in plane P' will be a series of bright lines spaced a 
distance raoA[d apart. We now place an opaque mask in the xy-plane 
blocking off every other line. Thus, only the central line and alternate lines 
on each side of it are allowed to recombine to form the final image. Since 
these peaks are spaced twice as far apart as the original pattern, they are 
characteristic of an initial grating whose slit spacing is d/2 rather than d. 
The image formed in the final plane will thus not be that of the original 
grating, but of a grating whose slits are twice as close together. Hence, the 
rather surprising result: by removing something at the intermediate plane 
we have added something to the final image. In particular, we have doubled 
the number of slits. 

Now consider the effect of the finite diameter of the aperture of lens Ly. 
We have seen that the spatial period of the final image of the grating depends 
on the spacing of the peaks in the intermediate pattern. Thus, it is reasonable 
to say that if the final image is to have the same period as the initial object, 
the aperture of lens La must at least be large enough to pass the central 
maximum and the first maximum on each side. If the number of slits, M, 
in the grating is large, we may approximate these three peaks in the diffraction 
pattern by delta functions, and write the distribution 


2729 


Fa) = A [o (x+ mro) 4 a(x) + d(x — ee) |. 


The final transform gives the distribution in x’ as 


F(x’) œ [^ F(x) g^ ikIraozz' 


PN CE Too X^ 
in 29 7 =r 20 7 
-—F mod +l+e Tgo d 


=A I + 2 cos (mz). 
Tao d 


6.11 SPATIAL FILTERING IN COHERENT ILLUMINATION 163 


| 

| 
PN 
rE 


Fig. 6.22. The angle subtended by two slits as viewed from lens Lo. 


which has a spatial period equal to that of the original grating with magnifi- 
cation rgo/rao. If the diameter of L, is smaller than that considered above, 
only the central maximum will be transmitted and the pattern in the x-plane 
will be one of uniform illumination. Thus, if portions of the original object 
—in this case, the slits of a grating—are to be resolved in the final image, 
the diameter of the image forming lens must satisfy 


2mr3o Aro 
D>2—— =2—. 
^U kd d 
From Fig. 6.22, we see that the angle subtended from the lens by two slits 
is given by 0 = djrə. Combining these two equations, we find that the 
condition to be satisfied in order that the two slits be resolved by the 
lens is 


0 > 24[D, 
which may be compared to Rayleigh's criterion: 
0 > 1.22a/D. 


In the preceding discussion we have examined the image formation 
process for an object with a basic single period. However, any object may 
be thought of as a combination of simple spatial variations by considering 
its Fourier transform. It is this transform which is spread out for us in the 
plane of lens L;. The value of the state function at the point x, y tells us 
the extent to which the object is composed of the simple exponential varia- 
tions e*/rz» and e"!"20 of spatial periods rooA/x and roo4[y. In particular, 
the value at x — 0, y — 0 gives the contribution of infinite spatial period, 
which is simply a uniform illumination. Now, an image of low contrast may 
be thought of as being an image of high contrast plus an excessive uniform 
background illumination One should be able to improve the contrast of an 
image, therefore, by removing a portion of this background This may be 
accomplished by placing a small opaque disk over the center of lens Lp. 
This is illustrated in Fig. 6.23. Figure 6.23(a) is a photograph of an image 


164 DIFFRACTION AND IMAGE FORMATION 6.11 


Object 


Fig. 6.23. (a) An image of low contrast; (b) Higher-contrast image formed by the 
same system with the central portion of the lens obscured. (From O"Neill, Introduction 
to Statistical Optics, Addison-Wesley, Reading, Mass., 1963.) 


formed by a lens system as shown in Fig. 6.20. Figure 6.23(5) is a second 
photograph taken after a small black dot had been placed in the center of 
lens Lo. 

The objects considered until now have been amplitude objects, i.e. 
u(£, 7) is a real function. The case of phase objects, which are transparent 
but which may vary in thickness or in index of refraction, is of great 
importance in microscopy. For a pure phase object |u(£,75)|? = 1 and 
Eq. (6.33) tells us that the intensity distribution in the image will be uniform. 
Writing 


ué, n) = ene, 
we further assume that the retardations, n, introduced by the phase object 
are small, z(£, 7) <1, so that we may approximate the exponential by 


2 
un) = 1 + in(é, n) — "en. 


(6.34) 
If Eq. (6.34) is placed in Eq. (6.29), the first of the three integrals yields the 
diffraction pattern of the aperture in the £y-plane. This will be concentrated 
in a small region around the origin in the xy-plane. The distribution in the 
remainder of the plane will be determined almost wholly by the remaining 
two integrals. Furthermore, under the assumption of small n(¢, s), the 
contribution of these integrals to the pattern at x — 0, y — 0 may be 
neglected. The center of the aperture in the xy-plane is now covered with a 
small plate which introduces a retardance of 4/4 and a fractional absorption, 
f. This is seen as equivalent to replacing Eq. (6.34) by 


nu). 
2 


ulë, n) = if + iné, n) — (6.35) 


165 


SPATIAL FILTERING IN COHERENT ILLUMINATION 


T | pt Li 


Fig. 6.24. Conventional and phase photomicrographs of various diatoms, fibers and 
bacteria. Note that in each figure details are visible which cannot be seen in the 
other. (Courtesy of T. J. Lowery and R. Hawley.) 


166 DIFFRACTION AND IMAGE FORMATION 


By Eqs. (6.35) and (6.33), the phase object u(£, 7) without the retarder plate 
would produce an image ¢(x’, y^), given by 


PR ; 1 
pay) = — 22 ee [ in( 2 yr, 2y) z” 


Fioř30 F30 F30 


with |(x', y’)|? = A*(rao[riorao)?, neglecting terms in n? and higher. With 
the retarder plate the image will be given by 


A á 

dG, y) — AN 20 on, lif + in — łn?], 
l'1ol'30 

with |px’, y)? = (f? + 2fn)|d(x’, y^)|? to the same order. If one defines 

the contrast in the final image as 


C= (I a Ton J| Din. 
where /Jimin, is the lowest intensity in the image with n = 0, one obtains 
C = 2nff. 


It is seen that even for very small retardations in the original object, e.g., 
n = 1073, an image of appreciable contrast can be obtained. If fis taken 
as 1072, the contrast is 20%. 

Although the optical construction of a commercial phase contrast micro- 
scope differs from the optical system of Fig. 6.20, the optical principle 
involved is as described here. In Fig. 6.24 photomicrographs are shown of 
the object with and without the insertion of the retarder plate. 


PROBLEMS 


6.1 Verify Eq. (6.1) by applying the divergence theorem, vector identity 14, to 
the function T A$ — ¢AYF. 

6.2 Using the definite integrals of Appendix VII, verify that the Fresnel integrals 
of Eq. (6.11) satisfy 


C(oo) = S(%) = 1/2, C(— %) = S(—9) = —1/2. 


6.3 If an aperture in a plane screen is symmetric about some axis, show directly 
from Eq. (6.21) that the diffraction pattern is symmetric. 

6.4 A square aperture is cut in a screen whose side is 0.3 mm. A monochromatic 
point source of wavelength 4 = 5000 A is placed 20 cm to the left of the aperture 
and a screen is placed 30 cm to the right. Let the observation point be taken at the 
corner of the geometric shadow of the aperture. Show by using the Cornu spiral 


PROBLEMS 167 


that the intensity at the observation point is 0.022 of the intensity at the same point 
when the diffracting screen is removed. 


6.5 The center of a circular aperture is blocked out until only a very thin annular 
region is allowed to transmit. Without a detailed calculation show that in the 
Fraunhofer case minima of the annular diffraction pattern occur at the location of 
the maxima of the original circular aperture pattern. 


6.6 Consider the Fraunhofer pattern of a single slit. Show that the width of the 
central maximum is in qualitative agreement with the uncertainty relation, Eq. 
(5.36), for the position and momentum of a photon. 


6.7 Show that, by Eq. (6.21), a shift in the location of the diffracting aperture 
does rot result in a shift in the diffraction pattern. Interpret this result physically. 


6.8 Consider the case of Fresnel diffraction with both source and observation 
point on the z-axis. Let the observation point be moved along the axis until it is at 
a point of maximum intensity on the axis. If the source distance, r1o, and the obser- 
vation point distance, roo, are now varied, the observation point being kept at an 
intensity maximum, show that, approximately, 

1 1 


— + — = constant. 
^o F20 


From Eq. (6.10) what variation must be ignored to obtain this result ? 


6.9 Show that the secondary maxima in the Fraunhofer diffraction pattern of a 
rectangular aperture are located by the solution of tan «x = «x, tan By = fy. 


6.10 If |k| = k is constant, show that f(r) of Eq. (6.3) satisfies Eq. (6.2). 


6.11 Babinet’s principle for scalar diffraction states that complementary screens 
produce complementary diffraction patterns. We take complementary screens to 
mean that one screen is opaque where the other is transparent and vice versa. 
If, and P; are the state functions produced by complementary screens, V, + F2 
= 1. Use Babinet's principle to show how the Cornu spiral may be used to compute 
the diffraction pattern of a rectangular obstacle. 


6.12. Show from Eq. (6.4) that a knowledge of the gradient of the incident state 
function over a diffracting aperture is unnecessary if a function $ is picked which 
identically vanishes over the aperture. Such a function is 


The distance r is defined as rz, as in Section 6.1, and r’ is as shown in Fig. P6.1. 
The point Q is located symmetrically across the screen from the observation point 
P'. Using the function ¢ defined above, apply Eq. (6.4) to apertures in plane screens 


and obtain 
T ikr 
Y=- ik ffaw”, 
2r r 


where the integration is over the aperture. The same approximations should be 
used as were made in Section 6.1. 


168 DIFFRACTION AND IMAGE FORMATION 


Figure P6.1 


6.13 Another model for a diffraction grating may be constructed as follows. 
A slit of width 2Nd is covered with a mask such that when uniformly illuminated 
from the left the intensity transmitted varies across the slit according to 


27120 
a, 


kd 


WE 
cos —— = cos 
d 
Show that the intensity in the Fraunhofer pattern is given by 
. [kd 
sin x 
1 2reo 
Xi kdx M 
229 
where V = 2N + 1. Note that the denominator vanishes at x = Arzo/d. Interpret 


this result. 


6.14 Show that the same techniques which converted the phase object of Eq. (6.34) 
into an amplitude image will convert the amplitude object 


p, n) = 1 + n(§, n) 


into a phase image. Assume that the amplitude variation of the object is small, 


né, n) <1. 


CHAPTER 7 


THE ELECTROMAGNETIC FIELD AND 
COHERENCE 


In 1873, Maxwell published his Treatise on Electricity and Magnetism. Yt 
has since been impossible to deny that light is somehow a manifestation of 
the electromagnetic field. In fact, by the turn of the century, light was 
thought to be nothing more than a self-propagating class of electromagnetic 
field. What forced this identification? Allow Maxwell to answer: 


‘ 


*. . . The properties of the electromagnetic medium are, insofar as we 
have gone, similar to those of the luminferous medium, but the best way 
to compare them is to determine the velocity with which an electro- 
magnetic disturbance would propagate through the medium. If this 
should be equal to the velocity of light, we would have strong reason to 
believe that the two media, occupying as they do the same space, are 
really identical."* 


Today we would prefer to delete the reference to the media, but the message 
remains. Theory does predict these velocities to be equal, and the study of 
optics is forever married to the study of the electromagnetic field. 

We now know that there exist physical entities, e.g., neutrinos, which 
also travel with this velocity, but which are not identified with light. The 
final witness to our marriage is the interaction of light with matter. Light 
interacts with the electric and magnetic properties of matter, i.e., with charge 
and current distributions. Neutrinos do not interact in this way. The honey- 
moon was quickly over. In October, 1900, Planck published his theory of 
black body radiation. Five years later Einstein raised Planck's mathematical 
trick of quantization to the status of accepted physical theory, using it to 
explain the photoelectric effect. 

Maxwell’s theory is of its nature a continuous one. There is no way of 
obtaining from it bundles of energy which are discretely absorbed or emitted. 


* Ether", Encyclopedia Britannica, 9th Edn, Vol. VIII. Published posthumously 
in 1878. 


169 


170 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.2 


Our problem is to superimpose on Maxwell’s theory a discrete interpretation 
of the nature of radiation. In the following chapters we will examine some 
of the aspects of this problem. 


7.1 THE CLASSICAL FIELD 


Maxwell’s equations, which completely describe the classical electric and 
magnetic fields are given below in MKS units. The vectors E and # are the 
electric and magnetic intensities. The electric displacement and the magnetic 
flux density are represented by the symbols D and B. The quantities p and 
J are the scalar charge density and the vector current density. 


oB 


VxE=-—-—> T 
x cm (7.1) 
V-D=p, (7.2) 

oD 
VxH=— +4, (7.3) 
V:B=0. (7.4) 
In the absence of dielectric and magnetic materials 

D = &E, (7.5) 
B= uH, (7.6) 


where £o and uo are the permittivity and permeability of free space. 
The classical force on a charged particle moving with velocity v, is given 
by the Lorentz force 


F = q[E + (v X B)], (7.7) 
which can be written as a force density in terms of p and J 
f=pE+ JXB. (7.8) 


It is assumed that these equations are familiar and that their experimental 
origins need not be discussed. 


7.2 THE WAVE EQUATION AND THE SPEED OF LIGHT 


In free space p and J are set equal to zero and we may eliminate B and D 
from Maxwell’s equations, using Eqs. (7.5) and (7.6). We then obtain 


eH 
VXE--u— i 
DE (7.9) 


V:E=0, (7.10) 


7.2 THE WAVE EQUATION AND THE SPEED OF LIGHT 171 


CE 
V XH = & -> .11 
Eo ET (7.11) 


V.H -0. (7.12) 


Taking the curl of Eqs. (7.9) and (7.11) and using vector identity 11 from 
Appendix V, we obtain 


[ 
—V?E = —Ho x (V X H), (7.13) 
9 
—VWH = £05, (V X E), (7.14) 
where the divergence terms have been eliminated by using Eqs. (7.10) and 


(7.12). Eliminating the curl from the right-hand side by using Eqs. (7.9) 
and (7.11), we obtain the well-known wave equations for the vectors E and H: 


1 @E 

Wie (7.15) 
1 eH 

VA = sa (7.16) 


These represent six equations, one for each component of E and H. These 
equations may not be solved independently, however, for the various com- 
ponents are still interrelated by Maxwell’s equations. We have, of course, set 


c = (Gio) >, 

Let us seek solutions of the wave equations which have a simple exponential 
time dependence. That is, we seek solutions for the components which can 
be written* as 

E(r, t) = Er) e^'**, 

H(r, t) = Hr) e^ ^t. 
Placing this form into the wave equation, we find after cancelling the 
exponential 


(7.17) 


V2F, + K?F, = 0, (7.18) 


where F; stands for any of the six components and |k| = c/c. Particularly 
simple solutions of Eq. (7.18) are e'*", giving the space and time dependence 
as 


F, = A, eit o, 


* An equally acceptable choice for the time dependence would be e* #”!. However, 
the choice of sign in the exponent largely determines all sign conventions for polari- 
zation states. To be consistent with the conventions developed in previous chapters 
we must choose the minus sign here. 


172 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.2 


x 


Fig. 7.1. A plane of constant phase for the function Ie. 


Frequently one chooses either the real or imaginary parts, or some real 
combination of them, and writes a real solution as 


F, = A,sin (k* r — ot + 4), (7.19) 


where ¢ is an arbitrary phase angle. 

In Fig. 7.1 is illustrated a plane surface which is normal to the direction 
of k. The vector r locates a point in this plane. The term K * r is given by 
|k||r| cos 8. But, since |r| cos 0 is the projection of r on to k, this quantity 
is independent of the location of point r, so long as that point lies on the 
plane. Hence, k : r and sin (A+ — wt + 4) are constant over the plane at 
any instant of time. As time increases, this plane must move outward along 
k with velocity c = /|k| in order to keep (k * r — cr) constant over the 
plane and constant in time. Thus, Eq. (7.19) represents a plane wave solution 
traveling in the direction of k with velocity c. 

We have shown, therefore, that there exists a class of solutions to 
Maxwell's equations which propagate through space with a velocity equal 
to the velocity of light. Incidentally, we are not surprised to find that the 
scalar state functions which we found in previous chapters are acceptable as 
components of the electric and magnetic fields. We are surprised to find that 
whereas we assumed before that we could describe light by a scalar function, 
we must now use two vector functions. This is an indication of difficulties 
whose discussion we postpone. 


7.3 CONSERVATION OF ENERGY IN THE ELECTROMAGNETIC FIELD 173 


7.3 CONSERVATION OF ENERGY IN THE ELECTROMAGNETIC FIELD 


If we are to associate an electromagnetic field with light, then we must show 
that the field transports energy and we must find an expression for the 
energy transport in terms of the field variables. Consider some fixed volume 
in space which contains a charge and current distribution, p and J. We make 
the following definitions. 

u is the density of any energy associated with the electromagnetic field 
itself. This quantity does not include any kinetic or potential energy of the 
charge distribution. 

S is the flux vector of the field energy, known as the Poynting vector. 
If n is a unit vector normal to a small area dA, (S+ n) dA is the amount of 
field energy which flows through dA per unit time. 

dW/dt is the rate at which the electromagnetic field does work on the 
charge and current distributions. 

We may now write the conservation of energy in the following equation: 


ô dW 
-4 fur = [Zaa IPTE (7.20) 
et dt 


The left-hand side is obviously the rate at which the total field energy inside 
the volume is decreasing. Assuming conservation of energy, the energy lost 
by the field must appear either as an increase in the energy of the charge and 
current distributions, the first term on the right, or as a flow of energy out 
of the volume, the second term on the right. 

The force density on the distributions is given by Eq. (7.8) and the rate 
at which this force does work on the distributions is f+ v, where v is the 
velocity of the charge distributions. Thus, 


dW 
Safty = GE IB): v. 


The second term in the dot product is zero, since the current density is in 
the direction of the charge velocity. Thus, we have 


dw 
g PEt YS Ed. (7.21) 


By using Eqs. (7.1) and (7.3), we may write 
ô. 
E:J=E (Yxn -2), 
et 


8 
0-H (VxE+ 5). 
ôt 


174 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.4 


Subtracting one equation from the other, we obtain 
oB oD 
H*(VxE) - E (VXH) E He TEASE Bee ED, 


We then apply vector identity 9 to the first two terms and eliminate B and D 
by using Eqs. (7.5) and (7.6) and find 


Ve (EX H) +5 = fT? + eyE?] + E* J — 0. 
This may be integrated over the volume of space in question to yield 
= Z IE + e E2)dV = fe JdV + Iv - (E x H) dV. 
Finally, using vector identity 14, we obtain 
— x [ie c + us H?) dV = [Gav + fex H):ndA. (1.22) 


If we compare this with Eq. (7.20), it is reasonable to make the identifications 


“= (MoH? + &gE?), (7.23) 
S=EXH, (7.24) 


which are the expressions we were seeking for the energy density of the 
electromagnetic field and the Poynting vector. 


7.4 CONSERVATION OF LINEAR MOMENTUM 


In the previous chapters we have discussed the experimental result that the 
ratio of the energy carried per photon to the linear momentum per photon 
is the velocity of light. We might then postulate a vector flux of linear 
momentum given by S/c. In some sense, the flux of a quantity is the density 
of the quantity times the velocity with which the quantity is being carried. 
It is tempting to associate a momentum density with the field given by 


pee (5) (7.25) 


This is indeed correct, but we must prove that the field equations predict 
this result. We shall also see that the momentum flux is not of the simple 
form postulated. 

In the previous section we were able to define a flux vector S of the 
scalar quantity u. Linear momentum, p, is itself a vector and its flux will 
be more complicated than that of a simple vector. We define P to be the 


7.4 CONSERVATION OF LINEAR MOMENTUM 175 


total momentum density, which is the sum of p, the momentum density of 
the charge and current distributions, and g, the momentum density of the 
fields. Not only is total momentum conserved, but each component is 
individually conserved. We can therefore talk about a flux vector for each 
scalar component of linear momentum. Define T, to be the flux vector of 
the i-th component of field momentum g;. Then, as in the preceding section, 
Ti: ndA is the amount of the i-th component of field momentum which 
flows through dA per unit time. T;, being a vector, has components. Define 


T; = j-th component of the flux of the i-th component 
of field linear momentum. (7.26) 


The nine quantities T;; may be thought of as a matrix which is known as 
Maxwell’s tensor. 

The rate at which the momentum p dV in volume dV is increasing 
through interaction with the fields is given by dp/dt dV. We may now write 
the equation for conservation of the i-th component of linear momentum 
over some fixed volume of space: 


ô dpi | 
——|gdV2|—dv T,- , 7.2 
5 gro [av [T nas (7.27) 


A decrease in the field momentum must appear either as an increase in the 
charge momentum or as a flux of momentum out of the volume. From 
Newton’s second law 


By use of Eqs. (7.2) and (7.3), this may be written 


aD 
P _ ye DE+ (vxH-2)xe 


dt 
1 f à 
= (V: DE — mH x (Y x H)— S (Ex m - Ex 2) 
c? Vot et | 
[z 


S 1 oH 
By comparing this with Eqs. (7.25) and (7.27), we see that we must express 
the i-th component of the vector in brackets as the divergence of some 
vector. If this is possible, then vector identity 14 will allow us to write the 
volume integral as a surface integral, as in Eq. (7.27). Adding H(V - B) = 0 
to the bracket and using Eq. (7.1), we obtain 


(V+ DE + (V+ BH — &EX(VX E) — uH x (V X H). 


176 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.5 


From vector identity 8, 
H x(V X H) = }V(H?) — (H* V)H, 
EX(V X E) = 1V(E?) — (E* V)E. 
This yields for the bracketed term 
ey( V + EVE + (E+ V)E] + pol(V * HH + (H> V)H] 
— 1V[eoE? + ug H?]. 


Letting x, (i — 1, 2, 3) represent any of the three coordinates x, y, or z, we 
may express the i-th component of this vector as 


^ 


i 


é ô ô 
HEE? H?” o (E — E: + Ez E; 
BCEE? + poH’) + 2 £o ( Hx; TE, ex, D 
+> (n 2 ht hh 
Ho ! ax, rt Ox, ; 


ô 
= Ow [UoH;H; + &yE;E; — 3oi(ecE? + uo H?)] 
j j 


=V- T, 
where 
(T); = HoH Hi + egE;E, — Io; (eoE? + MoH”), 


which completes the identification. We can thus set 


7.5 PLANE WAVE SOLUTIONS 
Returning now to the solutions we have found for the wave equation, we 
write the electric and magnetic field as 

E=E, e TUBE 


H = H, gieren, (7.28) 


where E, and Hy are constant vectors. The entire space and time dependence 
of the field is included in the exponential. The vectors E, and Hp are not 
independent of each other, or of k, as we shall see. The solutions in Eq. 
(7.28) are related by Maxwell’s equations. We now put Eq. (7.28) into 
Eqs. (7.9) and (7.11). Using vector identity 7, we obtain 


Vx E = V(e!*'-9?) x E, + e*'—"? V x Bo, 
Vx H = V(eX*'-"?) x Hy + e**—9? V x Ho, 


7.6 THE POLARIZATION OF PLANE WAVES 177 


where the last terms vanish, since E, and Hi are constant vectors. Using the 
fact that 


V(ger-on) = ik eT od, 
we obtain 


ik X E, = iwpoHh, (7.29) 

ik X Hy = —ime Eo, 
which tell us that Ey, Ho, and k are mutually perpendicular and that k is 
in the direction of S = Ex H. Thus, the direction of propagation of the 
wave, k, is in the direction of propagation of the energy flux, S. Although this 
is a reasonable result, it must be pointed out that it results from the particular 
form of the wave we have picked and is not general. Since the vectors are 
orthogonal, we may replace them by their magnitude. Remembering that 
w = |Kk|c, we obtain 


Eo] = uoc] Hb]; |Ho| = &0¢| Eo], 
or, for the plane wave solutions, 
Vaol Eo] = V uo| Ho]. (7.30) 
Thus, for plane waves we may write 
S = ecE?k = uocHŝk, (7.31) 
u = EE? = MoH}. (7.32) 


7.6 THE POLARIZATION OF PLANE WAVES 


A representation of a wave of the form of Eq. (7.28) is given in Fig. 7.2, 
where we have taken K to be along the z-axis. The vectors drawn indicate 
the direction of E and H at any point in a plane, perpendicular to the z-axis, 
which cuts the z-axis at the point which locates the tail of the vector. Since 
we have chosen a particular direction for these vectors, Eq. (7.28) becomes 


E= E, eitkz- ot) €; 


m idcza- ot) 
H-—H,e ey. 


(7.33) 


Here and in what follows we plot only the real part of the exponential. 
Actually, it is not necessary to draw both E and H. If the direction of k is 
given, then the results of the last section uniquely determine the magnitude 
and direction of H when E is given. From now on we shall draw only the 
vector E. For the wave shown in Fig. 7.2 the vector E points in the x-direc- 
tion. This will not always be so. The particular wave we have shown is the 
electromagnetic representation for a |P,>-polarization state. We are thus 


178 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.6 


Fig. 7.2. The electromagnetic plane wave. 


identifying the line of a P-state with the direction of the electric vector. To 
obtain a P-state whose line is at angle 0 with the x-axis, we would simply 
rotate Fig. 7.2 through an angle 6 about the z-axis. To complete the identi- 
fication of this wave with a P-state, we will describe how a P-filter may be 
constructed whose transmission properties obey Malus' law. In Fig. 7.3 
the electromagnetic wave, whose E vector is shown, is assumed to be coming 
out of the page. The series of straight lines represents fine conducting wires 
stretched in front of the wave. The electric field has been broken into 
components parallel and perpendicular to these wires. The spacing of the 
wires is comparable to the wavelength of the light. From Eq. (7.31) the 
energy flux incident on the wire can be written as the sum of fluxes associated 
with these two perpendicular components of the electric field: 


So = &cE2k = &c(E? + Eg)k. (7.34) 


E, 


Fig. 7.3. The electric vector of a plane wave incident on a wire grid may be broken 
into components E, and E ,, perpendicular and parallel to the wires. 


7.6 THE POLARIZATION OF PLANE WAVES 179 


Fig. 7.4. The electric vector for a P-state as the sum of electric vectors of orthogonal 
P-states. 


The oscillating component of the field parallel to the wires will set up an 
oscillating current in these wires. The energy of this component of the field 
will be dissipated in the form of heat owing to the resistance of the wires. 
The perpendicular component cannot set up a current across the wires, 
however, and its energy will not be dissipated but will be transmitted through 
the array. The transmitted intensity, S, will therefore be 


S = &cE? k = S, cos? 0, (7.35) 


which is Malus' law. For microwaves whose wavelength is of the order of 
centimeters the construction of such a P-filter is trivial. For light near the 
visible region one would need to place about 20,000 wires side by side in a 
space of 1 cm! It has, in fact, been possible to do this! For the details the 
reader is referred to the literature.* 

Since the wave equation is a linear equation, the sum of any two solutions 
is also a solution. Let us add together two waves, one whose electric vector 
is in the x-direction and one whose electric vector is in the y-direction. In 
Fig. 7.4 the vector addition is shown explicitly at only one point but it is 
clear that the result is a new P-state whose line makes an angle 0 with the 
x-axis given by 

tan 0 = EJE. (7.36) 
This reminds us of the fact that the P-state 


|P> = alP,> + bP, 


* Shurcliff, W. A., and S. S. Ballard, Polarized Light, pp. 26 ff, Van Nostrand, 
Princeton, N.J. (1964). 


180 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.6 
is a P-state whose line is given by 
tan 6 = bfa. 


We will return to this shortly. Let us now examine the electromagnetic 
representation of elliptical states. In Fig. 7.4 we have added two waves which 
are exactly in phase. The two waves reach their maxima and minima at 
exactly the same points on the axis. Let us now add the two waves in such 
a way that when one has a maximum, the other is zero. The result of the 
vector addition is indicated in Fig. 7.5 for the case of equal amplitudes for 
the x- and y-waves. It is seen that the tip of the E-vector lies on a circular 
spiral. The spiral is that of a left-handed screw. The wave in Fig. 7.5 is the 
electromagnetic plane wave representation of right circular polarization. As 
this wave passes through any plane perpendicular to the z-axis, such as the 
x'y'-plane, it is seen that the electric vector in that plane will rotate in a 
counterclockwise sense, when the wave is traveling toward the observer. 
By comparing Figs. 7.4 and 7.5 we see that right circularly polarized light 
may be obtained from a P-state whose line is midway between the x- and 
y-axes, by retarding E, by one-quarter of a wavelength. Alternatively, one 


Fig. 7.5. The representation of a right circular polarization state in terms of the 
electric vector of a plane wave. 


7.6 THE POLARIZATION OF PLANE WAVES 181 
may retard E, by one-eighth of a wavelength while advancing E; by the 
same amount. This yields 


E, 


ilk(z 4 A8) - ot] 
E,e€ d ej 


= etinl4 idcz- et) 
=e Eje biy 
E, = E, egiU(z- 4/8)- ot] e; 


= e in E, eik- ob ey. 


(7.37) 


Thus, if a P-state whose line is midway between the x- and y-axes may be 
represented by 


E = E, + E,, 
E, — Ey eilkz— at) €; 
E, = Ey ei **7 9D e, 
a right circular state is given by 
E = ei E, pet E. 


The preceding examples are sufficient to identify the Jones vector associated 
with a polarization state as 


= E, 
|E=C Py , (7.38) 
where the constant C will be determined by the interpretation of (E|E» as 
an intensity or a probability. The general elliptical state differs from the 
wave shown in Fig. 7.5 only in that the amplitudes of the two waves need 
not be equal and that the entire wave may be rotated about the z-axis. As 
a general elliptical state passes through a plane perpendicular to the z-axis, 
the electric vector traces out an ellipse, as shown in Fig. 7.6. The direction 


Fig. 7.6. The maximum projection of the electric vector of an elliptical state along 
the line of a given P-filter. 


182 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.7 


of rotation is counterclockwise or clockwise as the polarization is right- or 
left-handed. 

If a P-filter is placed in this beam, the transmitted electric vector will 
have the magnitude of the maximum projection of E on the line of the filter. 
This is shown as E; in Fig. 7.6. In terms of the electromagnetic representa- 
tion of states the interpretation of Fig. 3.3 should now be apparent. 


7.] THE LAWS OF REFLECTION AND REFRACTION 


Continuing to develop the electromagnetic representation of light, we must 
show that electromagnetic waves obey the laws of geometric optics, Snell's 
law, and the law of reflection. In Fig. 7.7 the xy-plane is taken to be the 
boundary between two dielectric media. Above this plane the index of 


refraction is given by 
ny cu = J (1.39) 
foto 


and below the plane it is given by 


na = cv, = / fata (7.40) 
N! folo 


Fig. 7.7. General orientation of the wave vectors k, K', and k” for reflection and 
refraction at a plane interface. 


UN THE LAWS OF REFLECTION AND REFRACTION 183 


A plane wave is incident having wave vector k which, without loss of 
generality, is placed in the yz-plane. A reflected wave is taken in the general 
direction k’ whose orientation is given by the ¢’-angles. A transmitted or 
refracted wave is located by k” and the angles 4”. In what follows the 
cosines of angles are written y, e.g., cos% = y;. Remembering that 
|k| = w/c and n = c/v, we write a component of a plane electromagnetic 
wave as 


E; = A; exp ("2 — ke r) exp (—iwt). 


The constants 4; may be complex, thus allowing for a phase difference 
between the different components. We do not, therefore, limit ourselves to 
plane polarized P-states. We thus write for the incident wave 


E, = A, exp [inyo/c (—z cos ¢ + y sin 4)] exp (— ior), 


E, = A, exp [in,w/e (—z cos ¢ + y sin ¢)] exp (—iot); SIM 
for the refracted wave 
E; = A; exp [inso "[c(yzx + ys y + yzz)] exp (—io"t), (7.42) 
E; = Aj exp [inso " |c(y7x + yyy + y;z)] exp (—iw"); | 
and for the reflected wave 
E; = A, exp [inio'[c(y;x + y; y + viz)l exp (—io't), (143) 


E, = A; exp [inyo'|c(y;x + y; y + yz) exp (—iw't), 


where we have included the possibility that reflection and refraction might 
alter w. 

It is well known that the component of the total electric field E which is 
tangent to the boundary between two simple dielectrics must take on the 
same value on both sides of the boundary. Thus, for z = 0, 


Ept E= Ef, (7.44) 
E, + E, = E, (7.45) 
Since Eqs. (7.44) and (7.45) are valid for all x and y, they are valid for x 
and y equal to zero. It is left to the problems to prove that this implies 
w = w = w”. We may then cancel the time exponentials in Eqs. (7.44) and 
(7.45) and obtain 
A, exp [inyo/cy sin 6] + A; exp [inyeo/e(yix + y4)] 
= A; exp [inso]c(y zx + yy). 
A, exp [in,w/cy sin $] + A7 exp [inyo]c(y;x + yy) 
= A; exp [ingo]c(y;x + yyy). 


184 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.8 


Fig. 7.8. Coplanar geometry for reflection and refraction. 


The only way in which these equations may be satisfied for all values of x 
and y is if the exponentials may be cancelled from both sides. This is possible 
only if 

y; =y, =9, (7.46) 


2$ n ^ 
n, Sin $ = Noy), = ny, 


7.47 
n; sind = n; sin $7 = n, sind). (7.47) 


Equation (7.46) tells us that k, K', and k” are coplanar and may be represented 
as in Fig. 7.8. Equation (7.47) is equivalent to the law of reflection, 


$ = pn 
and the law of refraction, or Snell’s law, 


ny sind = n sin 97. 


7.8 FRESNEL'S EQUATIONS 


In the last section we have determined the relation between the directions 
of the incident wave and the directions of the reflected and refracted waves. 
We proceed to derive Fresnel's equations, which relate the intensities of these 
waves to the incident intensity. These intensities depend on the orientation 
of the incident electric vector. We have shown that k, k’, and k” lie in the 
yz-plane, which will be called the plane of incidence. We distinguish the 
parallel case where the electric vector is parallel to the plane of incidence and 
the perpendicular case where the electric vector is perpendicular to this plane. 
We first consider the parallel case, which is illustrated in Fig. 7.9. We now 


7.8 FRESNEL'S EQUATIONS 185 


Fig. 7.9. The parallel case. The direction of the electric vector of the incident wave 
is parallel to the plane of incidence. 


impose the condition that the tangential components of both the electric 
field E and the magnetic field H must be continuous across the boundary 


between two simple dielectrics. In the parallel case this condition leads to 
E cos $ + E' cos ó = E" cos o^, 
? $ ? (7.48) 
H — H' = H". 


Since for a plane wave V cE = vV HH, the last equation may be written 
Velt E — veu, E' = Ves]us E". (7.49) 


If the two media are simple dielectrics whose magnetic properties do not 
appreciably differ from those of the vacuum, we may set ju, = Ms = fo, 
which leads to 


nE —n,E = nE", 
and, imposing Snell’s law, we obtain 
sin 9^ E — sin ó" E' = sin dE”. (7.50) 


Equations (7.48) and (7.50) may be treated as two simultaneous linear 
equations which may be solved for E' and E" to yield for the parallel case 


2 sin $” cos ó 
pB = 7.51 
BETON I A iE 
E, = tari (= 9) E,. (7.52) 


Xo tan(j d$ 


186 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.8 


E" 
Fig. 7.10. The perpendicular case. The direction of the electric vector of the incident 
wave is perpendicular to the plane of incidence. 


The perpendicular case is illustrated in Fig. 7.10. By methods completely 
equivalent to those which led to Eqs. (7.51) and (7.52) one may show that in 
this case 


. 2sin$' cos ó 

= n rd) 7 VM 
2 o sn$ —4), 
: sin ($ + 4^) ^ 


Equations (7.51) through (7.54) are known as Fresnel's equations. The 
subscript p has an obvious origin in the parallel case. The subscript s comes 
from the German, senkrecht, meaning perpendicular. The angles $ and ¢” 
are, of course, related by Snell's law, n, sind = m2 sing”. The numerical 
values given by Fresnel's equations are highly dependent on the ratio 
ng[n;. If this ratio is less than 1, there exists a maximum angle of incidence 
$e for which Snell’s law admits a solution, known as the critical angle, 
given by 


(7.54) 


sind, = nafn. (7.55) 


For angles larger than this there is no transmitted wave. 

In Figs. 7.11 and 7.12 the coefficients in Fresnel’s equations are plotted 
as a function of the incident angle ¢, for n a/n, = 3/2> 1 and n/n, = 2/3 
xcd]. 


7.8 FRESNEL'S EQUATIONS 187 


transmitted 


reflected 


-1 


Fig. 7.11. A plot of Fresnel's equations when the incident wave is in the less dense 
medium. The ratio n/n, has been taken as 3/2. 


transmitted 


0.2 


reflected 


-1 


Fig. 7.12. A plot of Fresnel's equations when the incident wave is in the more 
dense medium. The ratio mg/m, has been taken as 2/3. 


188 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.9 


79 THE COMPLEX FIELD 


We have been using complex, rather than real, solutions to Maxwell's 
equations because of the simplicity this introduces into many calculations. 
However, it must be remembered that whenever an actual measurement of 
the electromagnetic fields is made, it is a real number which is obtained. In 
what follows we are going to show that if V?(r, £) is a real scalar function 
of position and time—for example, one of the components of the electric 
field—then there is a unique and useful way to associate with it a complex 
function V(r, t). We do not assume that the field has a simple monochro- 
matic time variation but that it may be represented as a Fourier superposition 
of such variations. We also suppress the explicit space dependence and 
write 


1 E 
V(t) = —— Í Vlw) e do. 7.56 
Since V?(t) is real, V?* = V”. Applying this to Eq. (7.56), 


+o +0 
Í Y^*(o) et do) = Í ^ (c) e do, 


oo 


but, by a change of dummy variable on the right, œ being replaced by — w, 
we find 


Í V *(e) ei?t dw = Í i V (— o) e?! do, 
or 
Í [VY *(@) — Y^ (—o)] ei?! do = 0. 
Thus, Y^*(«w) — Y^(—«) is the Fourier transform of zero and is itself zero, 
giving 
Y^*(o) = Y (—o), (7.57) 
which is known as the reality condition. 
We now split Eq. (7.56) into two integrals: 
1 
Vn 


1 E 0 
VA(t) = WI f Yw) e dw + Í V (a) edo. 
T Jo — e 


Making use of Eq. (7.57) and again replacing œ by —« in the second 
integral, we obtain 


1 fo | pe 
vat) = — Í ¥ (e) e^ ®t dw + —— Í V *(o) e*t do, 
V20 Jo V2 Jo 


7.10 THE PSEUDOMONOCHROMATIC CASE 189 


which is of the form 
V^(r) = VG) + V*(t)], 
where 


2 co 
En ciem. y -iot dw. j 
w) = i (m) e- 9t do) (7.58) 


Equation (7.58) defines the complex function associated with the real function 
given by Eq. (7.56). If V? and V' are the real and imaginary parts of V, it 
is easily shown that 


i (V5)? dt = i (V£ dt = if VV* dt (7.59) 
and 


Í V? dt = | (V4? dt = | VIV® dt =0. (7.60) 


7.10 THE PSEUDOMONOCHROMATIC CASE 


In no case is it possible to produce perfectly monochromatic electromagnetic 
radiation. Of great practical importance, therefore, is the case of nearly 
monochromatic radiation. This may be called either the pseudomono- 
chromatic or the quasimonochromatic case. By these terms is meant that 
the Fourier transform of the radiation, ⁄ (œ), is zero except in some narrow 
band of frequencies, Aw, small in comparison with the midfrequency of the 
band, w. Values for Acc of 107? to 107? are easily attained in the range 
of optical frequencies and much lower values can be obtained from laser 
sources; 1077 to 107? are typical for lasers. In practice the pseudomono- 
chromatic case arises from nearly monochromatic sources and/or the use of 
narrow band transmissions filters with all detectors. 

We wish to show that in the pseudomonochromatic case V(t) has a 
simple monochromatic time dependence, e '^*t, modulated by a slowly 
varying envelope function. To see this we multiply Eq. (7.58) by e*'e«t: 


; 1 & ; 
gi9ot Vt) — "oe Í Y (e) e i(e-oot do. 
T JO 


If Y" (c) is different from zero only if œ ~œ wo, then only those exponentials 
will contribute to the integral for which Aw = w — «x — 0. Thus, the 
integral and, hence, V(t) e'^*, are slowly varying and we may write 


V(t) — A(t) edit) e^ tot, 


where A(z) and ¢(t) are slowly varying functions compared to e~ ‘et, An 
illustration of the variation of V(r) is given in Fig. 7.13, though the variation 


190 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.11 


Fig. 7.13. The time variation of a pseudomonochromatic variable Vt). 


is greatly exaggerated, since A«/c gives a measure of the relative rates of 
variation of the envelope function A(f) and the carrier eof, Since this 
quantity can easily be taken as 107? or smaller, it would appear that there 
is little difference between the monochromatic and the pseudomonochro- 
matic cases. On the contrary, we shall see that this difference introduces us 
to a whole new range of optical phenomena. 


741 COHERENCE 


The concept of coherence was briefly examined in Section 4.7. There the 
two extremes of completely coherent and completely incoherent super- 
position were discussed. We now examine in more detail these extremes and 
also the intermediate cases. For the moment we consider only one field 
variable or component of the field and postpone the problems of polarization 
until later. Consider any two points within the electromagnetic field located 
at rı and rg and a third point p. As in Fig. 7.14, points r, and rz have been 
isolated by the introduction of a screen with small holes at these points. If 
these holes are small enough, then scalar diffraction theory tells us that the 
field variable at p will be proportional to the sum of two disturbances, one 
due to each of the source holes, and may be written 


Vip, t) = CU ily, t) + Volta, t3)]. (7.61) 
We have explicitly indicated that the disturbance at p at time is due to the 
disturbances at rı and r at early times t; and f,. Because of the different 
path lengths, ¢, and f> are not necessarily equal. The intensity measured at 
point p by a detector will be given by 7 oc V*V, with 

I-V*V 

-qyuVt-cVQVE-cVUIVE-VIY. (7.62) 
where the constants of proportion have either been set equal to 1 or absorbed 
into the V’s. The detector will actually measure some time average of V*V. 


7.11 COHERENCE 191 


A lower bound for this averaging time with modern detectors can be placed 
at about 107? sec. This is to be compared with the period of oscillation of 
a wavelength 6000 A, which is 2 x 10715 sec. The average is therefore over 
a comparatively long time. We are also going to assume that the field is 
stationary. That is, although there may be short-range time fluctuations in 
the field variables, an averaging process will yield the same result indepen- 
dent of the origin of the averaging process in time. We identify 


L=V,V*,  L-VWVE 


where the bar indicates a time average of the intensities measured at p if 
only one of the apertures was open. Following Wolf, we introduce the 
complex mutual coherence function defined by 


Vis?) = Viri, t+ T)V (ro, t). (7.63) 
Equation (7.62) may be written in terms of this function as 
I-hctL5L-H4L2Relu) 


where Re indicates the real part of the function. When r, = r2, l';(7) is 
known as the self-coherence function, which as r — 0 becomes simply the 
intensity. The complex degree of coherence is defined as 


Ti2(7) " Ti2(7) 


= 7.64 
VEO) 20) VLI ne 


yix) = 
From the definition of a time average 


1 f+? 
Dior) = V(t + 7) X) = f A V(t + T7)VX(O) dt, 
and from the Schwarz inequality for any two complex functions f and g 
|ff£*g dt? < ff*f dt jg*g dt. 
It follows that 
0< |yra(7)| <i. 


In keeping with Section 4.7, the extreme values 0 and 1 are, respectively, 
associated with completely incoherent and coherent superpositions. 

It is of some usefulness to write [',. explicitly in terms of the real and 
imaginary parts of V: 


Dial) = VIE + 0V$G) + Vit + VO) 


(7.65) 
+ i Vt + VE) — iVE 4- 7)VXQ). 


192 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.2 


Now consider 
1 T oo oo 
Vit + V4) = = i dt Í Í do dw'¥ (wy¥ (o) eie gio * ot, 
2T J-v o Jo 


If the averaging time, T, is very large, the integral over ¢ yields a delta 
function, 6(@ + o). The remaining integrals are therefore nonzero only for 
w + c! = 0, which implies that either œ or c is negative or that both are 
zero. But 7/(0) = 0 and the integrations are only over positive c and w’. 
Therefore, 


V(t + 7)V (0) = 0, 
which, if written out in terms of real and imaginary parts, yields 


VR + VRO) = Vit + 7) V0, 


Vig + VK = -VfG + VO. i 

Equations (7.65) and (7.66) may be combined to yield 
Re 'r) = 2VEVE = 2VIVI, (7.61) 
Im I7) = 2V{VE = —2VfVj. (7.68) 


7.12 MEASUREMENT OF 712 
In this section we wish to show how the modulus of the complex degree of 
coherence, EOIR may be measured and how in principle the phase of this 
function may also be determined. The determination of the phase of yi; is 
still an important theoretical and experimental problem. 

We consider Fig. 7.14 to represent an actual two-slit diffraction experi- 
ment with Sy a pseudomonochromatic source. Since the source is pseudo- 
monochromatic, we may write 


Vj + 7) m A(t + 7) eg idi(t +7) g- eot +7). 
V*(t) E A(t) et ios et Hot, 
where the functions 4i, As, d, and $> are slowly varying compared to 


g- ‘ot, Tt follows from Eq. (7.63) that I’,.(r) and, hence, y;;(7) will be slowly 
varying functions of time compared to e~*o!, It follows that we may write 


Y12 as 
yax(7) = |y12(7)| g^ tlet2(7)— 007], 
where |y12| and «2 are slowly varying compared to or. The intensity at p 


may then be written 


T=4+h+ 2VhIa| y33| COS [o12(7) — wor]. (7.69) 


7.12 MEASUREMENT OF 719 193 


Fig. 7.14. Geometry used in defining the correlation function T; 2(7). 


Now, 7 is simply the difference in arrival times at p of the disturbances 
originating at r, and r, and is given by 
$1 — So 
Tom ————, 
c 

where sı and s; are the distances between p and r; and rg. As the point p 
is moved across the two-slit diffraction pattern, the variation will be due 
primarily to wor rather than o,4(7), which is slowly varying. Thus, the 
pattern will be essentially a cosine variation over several fringes. Using 
Michelson's definition of the fringe visibility, 


Visibilit = pees, (7.70) 
Te ba Th 


we obtain 
vu 
Visibility — dv uf 7, Inset 


If the size of the slits is adjusted so that 7, = 15, this becomes 
Visibility = |yis(7)|, 


which may therefore be measured experimentally. In principle, knowing wo 
and 7 as a function of position, one should be able to measure o,;(7) also. 
In practice, however, the variation of wor is so large as to hide the dependence 
On 05. 


194 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.13 


7.13 COMPLEX CORRELATION AND INCOHERENT SOURCES 


In the preceding section we have seen how a complex correlation function, 
yis, can be measured by using a two-slit interference experiment. We now 
wish to examine how the correlation function may be related to the sources 
of the electromagnetic field. When the actual time variation of *he field is 
known or can be assumed to be known, as was the case in Chapter 6, where 
completely coherent illumination was assumed, then the calculation of the 
correlation function follows directly from the definition of Eqs. (7.63) and 
(7.64). We now examine the other extreme, an extended source whose 
individual elements radiate incoherently. 

In Fig. 7.15, P, and P; are the points for which we wish to calculate the 
function V(t) Vf(r). We are limiting ourselves to the case 7 = 0. If the 
source is broken into small elements dA, then the variables V; and V; will 
be the sum of contributions v, dA and v; dA, where vdA is the contribution 
to V from the area of the source dA. We then write 


V(t) = fdA v(t), V(t) = fdA v(t). 


We use a prime to distinguish the dummy variables of integration and 
V(t) V(t) becomes 


VAVE = ffa dA’ vi (tož (t). 


If ô represents a delta function which is zero unless dA and dA’ are the same, 
the integral may be broken into two parts: 


Vv: = | Í dA dA' b vue + | | dA dA'(1 — Ò 0,08" 


= [et 4 | fea dA'(l — ô) 00%". 


Fig. 7.15. Geometry used for computing the correlation function due to an inco- 
herent source. 


7.13 COMPLEX CORRELATION AND INCOHERENT SOURCES 195 


The element v,(r)vT(r) is recognized as a correlation function between 
different parts of the source. Since we are assuming a completely incoherent 
source, this must vanish through the time average. We are left with 


VOVE = | dA vy(r)e&(). (7.71) 


Restricting ourselves to the pseudomonochromatic case, we put 
v(t) = a(t) e'*? g-itoot, 


and Eq. (7.71) becomes 


VOVE) = | dA a,(t)ag(f) eO 6201, (7.72) 


Now, since v, and v are disturbances which originate from a common 
source element at dA, they differ, if we assume isotropic radiation, only 
through different arrival times at the points p, and ps. Thus, 
Fo — r 
a(t) = a, ( — (2 — n) 2), 
c 


44) =, (1-2) 


If the source dA were a perfect monochromatic radiator, Eq. (7.73) would 
simplify, when r, and r, are approximately equal, to 


ag = a, (7.74) 


(7.73) 


and 
$i — $2 = k(rs — rə). (7.75) 


We wish to examine the extent to which Eqs. (7.74) and (7.75) may be used 
in Eq. (7.72) for the pseudomonochromatic case. 

We have indicated in Section 7.10 that Aw/œo is a measure of the ratio 
of the variation rates of the envelope and the carrier of a pseudomono- 
chromatic wave. This follows from taking the frequency, f = œ/27, as the 
rate of variation of an individual monochromatic wave. The reciprocal of 
the frequency, or period T, is a reasonable measure of the minimum time in 
which the wave varies appreciably. Thus, for a pseudomonochromatic wave 
T, = 1/Afo, known as the coherency time, may be taken as the length of 
time over which we may approximate the wave by a monochromatic wave. 
Associated with the coherency time is the coherency length, Le = T.c. Over 
this distance in space a pseudomonochromatic wave may be approximated 
by a monochromatic wave of wavelength Ao. 

In the example to follow, the source will be taken as a star, the points 


196 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.13 


P, and P, will be separated by approximately 10 meters along a line perpen- 
dicular to the direction of the star, and the angular diameter of the star as 
viewed from the earth may be put at about 0.05 seconds of arc. This leads 
to a maximum variation in rı — r of a few wavelengths for light in the 
visible range. Since the coherency length may be taken as a few hundred 
wavelengths when Aw/w) œ 1073, we may easily make use of Eqs. (7.74) 
and (7.75). Equation (7.72) thus becomes 


VOVE = faa a? eiti or, (7.76) 


Now, as indicated in Fig. 7.15, we limit ourselves to the case where the 
source is confined to a plane and the element of area dA is located by the 
coordinates € and 7. The points P, and P, are located in the xy-plane, 
which is taken parallel to the &y-plane. The distances r, and rz are given by 


n= [R? + (1 — £y + (9a = PEP, 
ra = [R? + (xo — $P + Oa — T”. 
The distance from the earth to the star is R. These may be approximated as 
m= +01 — 0)? 
2R 
(x; — 6)? + (y2 si n)? : 
2R 


rnœR+ 


ro œ> R+ 
After squaring the indicated terms, r, — rz becomes 


é 
n—rg—ó-—(5 X) 


T 
Qi): 
where $, which is given by 

1 
b= guod ob Ta. 


does not depend on & or y and may be taken through the integral in Eq. 
(7.76). 
If we now make the substitutions 


ax 
EN a B-———, 


R 
Eq. (7.76) becomes 


VOVE = e'? ffa dy a? e7 iet BM, (7.77) 


7.14 THE MICHELSON STELLAR INTERFEROMETER 197 
The intensities 7; and 7, at points p, and p; are given by 
L=h= Jfa dn a*(£, n). (7.78) 


Combining Eqs. (7.78), (7.77), and (7.64), we obtain for the complex degree 
of coherence between points one and two 


[Je dy a? e ik(xE c Bn) 
eM NE (7.79) 


[Je dy a? 


Equation (7.79) is a form of the Van Cittert-Zernike theorem. 


ya (0) = e'* 


7.14 THE MICHELSON STELLAR INTERFEROMETER 


In Fig. 7.16 aspects of the last two sections are combined. The source S is 
taken as a star a distance R from earth. Lens L is the objective of a telescope 
which would ordinarily produce the Fraunhofer diffraction pattern of its 
aperture in its focal plane at point p. The lens, however, is masked, leaving 
two slits separated by a distance d, at points P, and P. As described in 
Section 7.12, a pattern of fringes will now be obtained at point p. The 
visibility of these fringes will be given by |y,5(0)|. But y,;(0) will be deter- 
mined according to Eq. (7.79) by the intensity distribution a*(£, n) across 
the face of the star. By measuring the visibility of the fringes as a function 
of the slit separation d, we obtain information about y; and from this we 
may infer information about the size of the star. 

To evaluate Eq. (7.79) we take the intensity across the disk of the 


from star 


^1 


from star 


Fig. 7.16. The Michelson stellar interferometer. 


198 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.15 


star to be constant. By Eq. (7.79) 71.(0) should then be given by the 
proportionality 


y12(0) oc | | d£ dn e- etim, (7.80) 


where the integration is over the disk of the star of radius p. The integral 
is recognized as the one encountered in Section 6.9 for the Fraunhofer 
diffraction pattern of a circular aperture. By the result of that section yis 
should therefore be proportional to 2/,(v)/v, where v = kdp/R and J,(v) is 
Bessel’s function of order one of the first kind. The proportionality may be 
made an equality by noting that when d = 0, we also have « = f =v = 0. 
In this case Eq. (7.79) reduces to |y12| = |y11| = 1, which is also the value 
of 2/,(v)/v for v = 0. We then can write 

2J,(v) 

m 


|y12(0)| = : (7.81) 


As the separation of the slits is increased, the visibility of the fringes 
will decrease from a maximum of one, with zero separation, and become 
zero when the first zero of J,(v) is reached at v = 1.227. Thus, if the fringes 
vanish for a slit separation d, the angular diameter of the star, 2p/R, is given 


by 
2p/R = 1.22A/d. (7.82) 


In practice the experimental arrangement cannot be exactly as discussed. 
For example, Michelson’s measurement of the star Betelgeuse yielded an 
angular diameter for this star of 0.047 seconds of arc. Assuming an effective 
wavelength of 5500 A, the slit separation at minimum visibility is 5.9 meters, 
which exceeds the diameter of the largest telescope objectives. 

In order to overcome this difficulty, Michelson used four small auxiliary 
mirrors, as shown in Fig. 7.16. The separation of the slits remains fixed 
and the distance between the two outer mirrors is varied. In this arrangement 
it is the separation of these outer mirrors which must be inserted for d in 
Eq. (7.82). 


7.15 COHERENCE AND POLARIZATION 


Until now we have treated only a single field variable. The field; óf course, 
consists of three such components. If we limit ourselves to waves which are 
at least approximately plane waves we may take the component along the 
direction of propagation to be zero and consider only two Cartesian com- 
ponents which are perpendicular to this direction. We take these components 
to be along the x- and y-axes. We modify our notation to include a subscript 
indicating the direction of the field component. For example, VE(r,, t) is 


7.15 COHERENCE AND POLARIZATION 199 


the real part of the complex field variable associated with the x-component 
of the field. There are now four possible correlation functions which may be 
arranged in a correlation matrix 


= Tyra, ro 7) Pali ro 7) 
Jani = (Ron Ta, T) Puri ro, 7) ; DOM 
where 
Vir, Ta, T) = Viri, t + 7)V (rs, t). (7.84) 


We will not consider the general case given by Eq. (7.83) but will limit 
ourselves to r, = ro, 7 — 0. Since the points r, and rs are coincident, in 
what follows we drop the subscripts on Js and write Jy. = J. 

Just as 732, as defined by Eq. (7.64), is a measure of the degree of 
coherence between a single field variable at two different points in space, the 
quantity 


May = Vu V PaE (7.85) 


may be taken as a measure of the coherence between two different compo- 
nents of the field at the same point in space. Again appealing to the Schwartz 
inequality and the definition of I’;, by Eq. (7.84), it is clear that 


0<py <1. (7.86) 


In Section 7.5 we have seen that in the plane wave representation of 
light, the Jones vector may be taken as proportional to the field vector: 


5-( 


Within this same representation, it should be apparent that the coherency 
matrix J is simply the time average of the projection matrix |S><S| associated 
with the Jones vector, | S>. 

Thus, the matrix J is an old acquaintance, arrived at from a new point 
of view. Since most of Chapter 4 was spent in a study of |S><S| and its 
expansion in terms of the Stokes parameters, we do not need to develop 
the mathematics again. It will be useful, however, to write the Stokes 
parameters in terms of the l'.. Following exactly the procedures of Chapter 
4, we obtain 


P, = m du Lys 
Py T D (7.87) 
P, = T. + Ts 


P; = (Dry a Pu), 


200 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.15 


which may easily be inverted to yield the T';; as functions of the Stokes 
parameters: 


D. = 10. F P3), 
T, 2(Po = P4), (7.88) 
I (P2 za iP3). 


Since the measurement of the Stokes parameters was discussed, 
independently of any representation in Chapter 4, we need not examine the 
measurement of the D';;. 

It remains, then, to compare in some way the theory of partial polariza- 
tion with that of partial coherence. In Chapter 4 the degree of polarization 
was defined as 


2 2 271/2 
P= Ata aS ^| s (1.89) 
Po 
Using Eq. (7.87), we may write this in terms of the T';: 
n 4 Det (J) 
L- P 2— —À 7.90 
Ta + Tu gn) 
Using Eq. (7.85), we obtain 
, DV yy 
1- P= 4 [| — |tz’. (7.91) 


BP. + Py)? 


Now, L, and T, are real positive scalars and it is easily shown by performing 
the square that 


PP yy < BC. + DF. 
It follows immediately that 
[uz] <F. (7.92) 


Therefore, the degree of correlation between orthogonal components of the 
electromagnetic field does not exceed the degree of polarization of the wave. 
We now show that the strict equality holds in Eq. (7.92) if the orientation 
of the axes x and y are picked correctly. 

As was seen in Section 4.7, the degree of polarization, 2, is independent 
of the choice of axes x and y. This is not true of j,,. The results of Section 
4.5, however, indicate that we may pick a set of axes for which |V,|? and 
|V,|? are equal and therefore for which T,, = Pyy and |P.y| = \T,.|. If 
these conditions are placed in Eq. (7.91), one finds 


[t| =P. (7.93) 


Thus, the maximum value of |,,| which will be measured is equal to the 
degree of polarization of the state in question. 


7.16 INTENSITY FLUCTUATIONS 201 


7.16 INTENSITY FLUCTUATIONS 


It has been seen that a correlation function T’,,(r,, ra, 7) can be measured 
by allowing the disturbances at points r, and r, to form a diffraction pattern 
whose intensity is measured. In some ways it would be more appealing if 
separate measurements could be made directly at z, and rz which could then 
be compared to yield information about Iip. If we substitute the word 
detect for the word measure, this is possible at the lower frequencies in the 
electromagnetic spectrum, e.g., radio and television frequencies. At these 
frequencies antennae and preamplifiers can detect and transmit the instan- 
taneous value of the field variable to a central receiver where correlation may 
be performed. At optical frequencies this is impossible. In this range we 
can detect only intensities V*V, rather than the instantaneous field V(r). 
However, in some cases the slow intensity variations as well as the rapid 
variation of the field variables yield information of T(r). 

We are thus interested in the relation between the correlation of field 
variables 


T,2(7) = V*( + T) V(t) 


and the correlation of intensities 


Lla = V¥(t + V(t + 7) VED)V),. 


The relation between these measurables depends on a more detailed know- 
ledge of the manner in which the field variable fluctuates than we have so 
far considered. 

Whether or not we may actually measure V(r) as a function of time, 
our mathematics presume the existence of this function. Within this mathe- 
matical framework we may thus ask for the probability that within a 
sufficiently small time interval, dt, the field variable takes on a particular 
value, V(t). We define the function p(V*) dV” as the probability that 
within the interval from t to £ + dt the field variable has a value between 
V? and V* + dV’. 

The functional form of p(V^) will depend on the nature of the source 
and its coherency properties. We shall consider only one case, that of a 
thermal incoherent source. In this case we take the distribution p(V*) to 
be a normal Gaussian distribution given by Eq. (7.94) and Fig. 7.17: 


po = e v me? (7.94) 


oV 2m 
The parameter o is known as the standard deviation and is a measure of the 
probable deviation of the variable from its average, which is zero. 


The choice of this distribution results from considering V? to be a sum 
of many contributions from independently (randomly) radiating parts of an 


202 THE ELECTROMAGNETIC FIELD AND COHERENCE 7.16 


1 
v2n6- Pmax 


Fig. 7.17. The Gaussian distribution. 


incoherent source. Without attempting to specify necessary and sufficient 
conditions, we know from the central limit theorem of statistics that the 
distribution of the sum of a large number of randomly varying quantities 
will approach a Gaussian distribution function, independent of the distribu- 
tion functions for the individual quantities. 

The standard deviation of the distribution may be determined by the 
conditions, Eq. (7.59), 


Y=” = [ V*'p(V?) dV?, 


d 


which may be evaluated, using the results of Appendix VII, to give 
M = 0%. (7.95) 


Now, consider two field variables Vi and Vj, which could be any two 
of V¥(r,), V(r), V'(r3), and V'(r;), which are assumed to have Gaussian 
distributions but have a nonvanishing correlation VivilViVi =p. The 
appropriate bivariant Gaussian distribution 


p(Vi, VÀ, o3, o5, p) dVi dVj, 


which is the probability that V! will lie between Vj and Vi + dVj and Vj 
simultaneously between Vj and Vj + dV3, is given by 


gee Nee d exp {— 5 (2) + (2) i voi (7.96) 
2m V] — p? 0102 2 L\o O5 O15, ^ ? 
Using this distribution, we may determine the relation between vive and 
(Vivi)? for the Gaussian variables Vi and V7: 


VEVE = | avi avi vévipert Vi, os os 


7.16 INTENSITY FLUCTUATIONS 203 


The integration may be found by use of Appendix VII, and we obtain 


y*y? = v? VE xviviy. (7.97) 


Using Eq. (7.97), we may now examine the correlation between the inten- 
sities: 


LE + DRA = Vt + VAG DVA)OVE. (7.98) 


If the right-hand side of Eq. (7.98) is written out in terms of the real and 
imaginary parts of V, and V2, we obtain 


L(t + Dit) = VEVE + VPV? + vPvR + VŽVĚ. — (799) 


By use of Eqs. (7.97), 


Lh =V yi! + VÊ V? + VË VR 4 VP VR 
-F2(RVIy + (VIVE)? + (VEVE? + (VIVO?) (7.100) 


By use of Eqs. (7.59), (7.67), and (7.68), Eq. (7.100) becomes 


Lt + dh) = FH £5 + [Re (T12)? + [Im (P, 
-Hhg4|gnnaoyp (7.101) 
=h hU + i2. 


Thus, measurement of an intensity correlation yields a knowledge of |y12(7)|. 
A measurement made in this way may be put to the same uses as when [i5] 
is obtained by measuring V,(‘)V(1). In some ways the intensity correlation 
is the easier method. Hanbury-Brown and Twiss have used intensity correla- 
tion for the determination of the angular diameter of stars. Figure 7.18 is 
a schematic representation of their apparatus. Since high-quality imaging 
of the star is not required, the mirrors 4; and A, were searchlight mirrors 
which focused the stellar image on photoelectric detectors P, and Po. The 
output of these detectors was fed through amplifiers B to a multiplier C and 
then to an integrator M which performed the time average. The actual 
electronics involved was, of course, more complicated than this simple block 
diagram indicates. The time delay Ar which is introduced between P; and 
B; compensates for the difference in arrival time at the two mirrors of the 
light from the star. It should not be confused with the 7 of y,«(7). What is 
measured in this experiment is |y,.(0)|. Because it is the slowly varying 
intensities that are measured and compared rather than the rapid amplitude 
and phase variation of V(r), the intensity correlation technique is less affected 
by slight variation in path length, allowing longer base-lines. It is also less 


204 


THE ELECTROMAGNETIC FIELD AND COHERENCE 


7.16 


Fig. 7.18. Block diagram of the stellar intensity interferometer. (From Hanbury- 
Brown, R., and R. Q. Twiss, Proc. Roy. Soc. A248, 201, 1958). 


influenced by atmospheric turbulence and "seeing" than is Michelson's 
method. In Fig. 7.19 are shown the results of measurements made on the 
star Sirius A. The curve is the theoretical result for an angular diameter of 
0.0069 seconds of arc. 


normalized correlation 


eo 
nr 


base-line (meters) 


Fig. 7.19. The correlation function as determined experimentally for the star 
Sirius A. (From Hanbury-Brown, R., and R. Q. Twiss, Proc. Roy. Soc. A248, 
235, 1958). 


PROBLEMS 205 
PROBLEMS 


7. In the presence of a conducting medium in which the current density and the 
electric field are related by J = gE, where the conductivity g is a constant, show that 
the wave equation becomes 


v? 1 e ô E 0 
-az ENR (r,t) = 0. 


7.2 In terms of the electromagnetic plane wave representation of polarization 
states, show that 


1 ] py 
ae |P.> + e iri* |Py>} 


represents an | R> state if the time dependence e* i?! js chosen in Eq. (7.17) instead of 
e^ iat, 

7.3 The Poynting vector for a plane electromagnetic wave is in the direction of 
propagation of the wave. This conforms to our physical intuition. Consider a long 
straight wire carrying current J. Show that the Poynting vector is perpendicular to 
the wire and points in toward the center of the wire. Show that S vanishes as 
the resistance of the wire goes to zero, keeping J constant. Interpret this result 
physically. 

74 A large radar antenna emits a narrow beam carrying a total average power of 
10° watts. Show that there is a reaction force on the antenna of 3.3 x 107° newtons. 


7.5 Consider an electromagnetic wave traveling in the z-direction: 
E= Eje 27 90, H- Hett? 90, 
Show that if Eo and Ho are not constants but are functions of the coordinates x 
and y, the fields E and H must have nonzero z-components. 


7.6 Show that for normal incidence, ¢ = ¢’ = 4" = 0, about 4% of the incident 
intensity of a wave is reflected at a glass-air interface when the index of refraction 
of the glass is n = 1.5. 


7.7 When $ + ¢” = 90° in Fig. 7.8, the reflected wave is completely polarized. 
If the incident beam is completely unpolarized, show that the ratio of the reflected 
intensity to the incident intensity in this case is 


1 (; — qe 
AETR 
where n is the index of refraction of the glass. The angle of incidence in this case is 


known as the polarizing angle. When n = 1.5, show that the above equation yields 
about 7.4% reflected totally polarized light of the incident light. 


7.8 Derive Eqs. (7.53) and (7.54). 
7.9 Verify Eqs. (7.59) and (7.60). 


206 THE ELECTROMAGNETIC FIELD AND COHERENCE 


7.10 Ifa real field variable is given by V'(r) = A cos wot, show that Eqs. (7.56) 
and (7.58) result in 


Y (w) = — [27d(w + wo) + 276(w — wo)], 
(9) = LA Rado + 09) + 2nd — og] 
V(t) = Aet, 


7.11 Explain from diffraction theory why the constant C in Eq. (7.61) must be a 
pure imaginary. 
7.2 For the complex signal V(x, t) = Ae'*7- 9? show that 

NC = A3eilkG, - z9)- o1. 


7.13 Consider a source of light emitting incoherently in the range 4 = 5000 A + 
1/2 A. Let this source be used with the Mach-Zehnder interferometer of Chapter 
2. Using simple arguments based on coherence length, say how many wavelengths 
may one beam be retarded over the other and still produce the results given in 
Chapter 2. Show that this retardation is equivalent to inserting a plate of glass 
(n = 1.5) in one beam whose thickness is of the order of 1 mm. 


7.44 Prove that 4e/?:! + Be?! = 0 implies that œ, = w2. Show in general that 


implies 
On = Og. 
These results were used in deriving the laws of reflection and refraction. 


715 If V"(r) = Asin wt, show that the probability that the function at some 
particular time takes on a value between V” and V? + dV? is given by 


rear, À dV? 

PAVE = — ioe 

7.16 An inebriated mathematician starts from a lamp-post, which we locate at 
x = 0, and takes equal steps one after another toward the right (+x) or the left 
(—x) with equal probability. After taking a large number of steps, N, of equal 
unit length his most probable x-coordinate will be zero, as he is just as likely to 
wander to the right as to the left. However, his most probable distance from the 
lamp-post, |x|, is given by x? = N |x| = VN. To show this, let x(N) be his probable 
distance from the origin after N steps and explain why 


[X(N + DP = (XN) + D? + GXQN) — 1771/2. 


Using finite induction, prove that x(N) = VN. Discuss how this result might be 
applied to the incoherent addition of P-states whose lines are parallel. 


7.17 As a very crude model of a star, we picture a uniformly radiating core of 
radius A surrounded by a partially absorbing atmosphere of thickness T, with 
T <R. Use this model to explain why the rim of a stellar disk is not as bright as the 
central portions. How would this phenomenon, known as limb darkening, affect 
the determination of stellar diameters? 


CHAPTER 8 


THE MOMENTUM SPACE REPRESENTATION 


Elements of a mathematical language have been developed in preceding 
chapters with which the quantum mechanical description of physical reality 
can be stated. This description involves a state function or state vector 
from which all measurable aspects of a system can be determined. In Chapters 
2, 3, and 4 only polarization states were considered. In the last three chapters 
the space and time dependence of the state vectors was taken up. In 
this chapter the meaning of the state function is reexamined. In particular 
we shall see that a state vector which is a function of position in real space 
cannot rigorously be defined for the photon and that our description of 
light must be extended to include a more abstract momentum space. This 
is not to imply that the quantum mechanical formulation we have developed 
is invalid. Rather, this accurately reflects the difficulties, mentioned in 
Chapter 2, of locating a photon—for example, in one arm or the other of an 
interferometer. 


8.1 THE TIME DEPENDENCE OF THE STATE FUNCTION 


Consider a state function which is a function of position and whose bracket 
product is normalized to unity. The equation 


EE (8.1) 


is interpreted to mean that the probability of finding the particle, somewhere, 
is identically one. It is the strict validity of this last statement that we wish to 
reexamine. 

Let us consider a "free" photon, one which is in a region of space devoid 
of other matter, and for which the probability of interacting with other 
matter is therefore zero over some finite time. Thus, the probability of 
finding this photon somewhere within this region must be constant over this 

207 


208 THE MOMENTUM SPACE REPRESENTATION 8.1 


time. If the probability is constant, its time derivative must vanish. If 
|r> is the state function of the photon, as in Eq. (8.1), we must have 


0 E 
e err? = 5 | drp*y = 0, (8.2) 


which yields 


ô Oy* y ĉe 


oy 
=2Re | dr y*. 
ef iy 


Now assume that no relationship necessarily exists between v and its time 
derivative Oy/ét. If this is so, then one could imagine at some particular 
time the possibility that y = @y/ét, which would imply that 


pry = y*oy[0t 


would be real and positive. Thus, 
OP M 
Í dr Or yr > 0, 


which is contrary to Eq. (8.2). 

Thus, if our probability interpretation of the bracket product is correct, 
there must exist some relationship between y and dy/ét at each instant of 
time. The condition in its weakest form would simply deny that y and 
dy/ét could ever be equal. The condition, however, must be strong enough 
to guarantee that p*éy/ér cannot be everywhere positive. 

As an example of such a relationship, let us consider a particle such as an 
electron moving in empty space. As was pointed out in Chapter 5, the opera- 
tor formalism is not limited to light and we should be able to write eigenvalue 
equations for electron state functions. For a classical particle of mass m 
the kinetic energy and the linear momentum are given by 


E = 4m’, p = m, 
which leads to the relation 
E = p?[2m. (8.3) 


To obtain the quantum mechanical equation for such a particle, we simply 
reinterpret this as an operator equation. We thus make the replacements 

E — iho/dt, 

p —ihV, 


8.1 THE TIME DEPENDENCE OF THE STATE FUNCTION 209 


and obtain 


2 P 
which is the Schrödinger equation for a free particle of mass m. We see that 
in this case there is a relation between y and y[6t. If we know the state 
function as a function of position, we may compute its time derivative by 
using Eq. (8.4). 

Let us try the same procedure for the photon. The difference is that Eq. 
(8.3) does not hold for the photon. For the photon the relation between 
energy and momentum is given by 


|p| = Efe. 
The absolute value sign may be removed by squaring both sides, giving 
E? prea (8.5) 


Tf this is interpreted as an operator equation, we obtain 


or 


age ei (8.6) 


which, not surprisingly, is the scalar wave equation obeyed by each com- 
ponent of the electric and magnetic fields. Although this yields a relation 
between y and é?y/ét?, it imposes no condition on ôy/ôt. It is well known 
that a unique solution of a second-order differential equation can be ob- 
tained only after auxiliary conditions are imposed on both the function and its 
first derivative. 

The state functions for light that we have been using which are solutions 
of Eq. (8.6) may not be used with rigor to compute the probability of finding 
a photon in the volume dV. This interpretation for the photon which was 
used in earlier chapters must be viewed as an approximation. Its success in 
the study of diffraction indicates, however, that the approximation is a good 
one if the volume dV is not too small. 

The difficulties pointed out here do not indicate that we must search 
for another equation to replace Eq. (8.6). Rather, the results obtained 
reflect the difficulty of locating individual photons in space. In the following 
sections we shall see how ¢y/ét may be determined if we abandon a des- 
cription of the photon in real space. 


210 THE MOMENTUM SPACE REPRESENTATION 8.2 


8.2 MOMENTUM REPRESENTATION OF CLASSICAL PARTICLES 


In the following sections we are going to see that a good state function may 
be obtained for light if we allow the state function to be a function of momen- 
tum rather than position. To perform the transition from real space to 
momentum space as clearly as possible, we will examine how it can be done 
for a classical particle. We confine the discussion to two-dimensional motion 
without a loss of generality. The motion of a particle is usually considered 
as given if its position is specified as a function of time. A particle thus traces 
out some curve in space with the coordinates given by 


x=x(t), y= y(t), 


as shown in Fig. 8.1. Since the particle is moving, it possesses at each point. 
of its path a velocity and a momentum whose direction is tangent to the 
path and whose components are given by 


d d 
mcm x(t),  p,—m j^ 
We now consider a two-dimensional momentum space whose axes are 
labeled p, and p,. At each point of the motion in real space the components of 
the momentum give us a pair of numbers which determine a point in 
momentum space. Thus, the particle traces out a path in momentum space, as 
shown in Fig. 8.2. 

Since the particular path shown starts from the origin in momentum 
space, it represents a particle which starts from rest in real space at fo. The 
point where the path crosses the p,-axis in momentum space corresponds to 
the point A in real space where the particle reaches its maximum y-value 
and, hence, has zero y-velocity. In general, the shapes of the two paths in 
real space and in momentum space are not apparently or simply related. 


y 


X 


Fig. 8.1. The path of a particle in real space (two-dimensional). 


8.2 MOMENTUM REPRESENTATION OF CLASSICAL PARTICLES 211 


Py 


Px 


B 


Fig. 8.2. The path of the particle of Fig. 8.1 in momentum space. 


For example, a particle which moves with constant velocity in real space is 
represented by a single point in momentum space. 

We have seen how the momentum space path is determined by the path 
in real space. It is easy to turn this procedure around. If we are given the 
path in momentum space as a function of time, we may obtain the path in 
real space through 


1 t 
x(t) = xo + — il p.t, 
m Jio 


t 


V(t) = Yo + 1 I p. (oat. 


Thus, the whole story of the particle's motion in momentum space is known 
if we know its position in real space as a function of time and the motion in 
real space is known if we know the position of the particle in momentum 
space as a function of time and also know the initial position of the particle, 
Xo and ye. In fact the entire content of classical mechanics can be cast in 
the formalism of either real space or momentum space. However, at an 
elementary level the momentum space representation offers no particular 
advantage. 

Consider, however, not a single classical particle but a collection of them— 
say, the 10?° particles which make up a mole of an ideal gas. In this case one 
has little interest in the actual location of the individual particles in real 
space. The measurable properties of the system, the pressure and tempera- 
ture, specific heats, etc., are determined by the distribution of velocities of the 


212 THE MOMENTUM SPACE REPRESENTATION 8.3 


molecules. It is the gas considered as a distribution of points in momentum 
space that is physically meaningful. One is more interested in the gas 
density in momentum space, the number of molecules of mass m with 
momentum between mv and mv + mdr, than in the density in real space, 
the number of particles within a real volume element dr. 

When the study of classical particles is replaced by the study of photons, 
the use of momentum space is absolutely necessary since, as we have seen, 
the location of photons in real space cannot be always rigorously defined 
and, hence, no density function in real space can always be rigorously given. 
Therefore, we proceed to rewrite the theory of the electromagnetic field 
in momentum space and then impose upon it the state vector interpretation. 


8.3 MAXWELL'S EQUATIONS IN MOMENTUM SPACE 


Let us assume a solution to Maxwell's equations which consists of two 
vector functions of position, E(r), H(r). These functions may be written as 
Fourier superpositions of plane waves of definite linear momentum hk as 


1 f ien 
E(r) = arp | dké(k)e*"., 
l a ier 
H(r) = nya | oras : (8.7) 


If E(r) and H(r) obey Maxwell's equations, we will determine the equations 
which & and # obey. Since the momentum of a photon is given by p = hk, 
a description using a space whose coordinates are k,, k,, and k, will be 
called either the K-space representation or the momentum space representa- 
tion. Consider the Maxwell equation, Eq. (7.3), 


8B(r) oH(r) 
a = 


VxE — 
x Hr et et 


By use of Eq. (8.7) the left-hand side may be written 
l 
V x E T aan V @ ik-r 3 
One | dkV x [G6 (k)e*"] 


where we have taken the vector operator through the integral sign, since it 
operated on the spatial coordinates and the integration is over the k-space 
coordinates. By use of identity 7 of Appendix V, the integrand may be written 


V x [G()e**] = Ve** x E(k) + [V x &(K)]e*". 


The second term vanishes because &(k) is not a function of the spatial 
coordinates. After taking the gradient in the first term, we obtain 


V x [£(K)e**] = ilk x E(k". 


8.4 THE FIELD VARIABLES IN K-SPACE 213 


Finally, the Maxwell equation, Eq. (7.3), becomes 


1 
Qa) 


i oH 
| dk[ik x &e'*"] = — mod | ie 


or 


9, 
COTA 


Thus, the quantity in brackets is the inverse Fourier transform of zero and is 
itself zero. We then have 


OER) 


ik X E(k) = — 
ik X E(k) Es 


By similar arguments we may obtain the K-space representation for all of 
Maxwell’s equations. These are listed below beside their corresponding forms 
in real space. The dot signifies partial differentiation in respect to time. 


ikx4 = —uX, VXE- -—yuH, (8.8) 
ikxH = cé, VxH=eE, (8.9) 
k'& =0, V:E=0, (8.10) 
k-H — 0, V.H-0. (8.11) 


It must be remembered that E(r) and H(r) are real quantities. Since & and 
H are related to these real quantities through a Fourier transform, 6 and W 
must obey a reality condition, as was shown in Section 7.9. Thus, in addition 
to Eqs. (8.8) through (8.11) & and # obey 


&*(k) = &(—K), (8.12) 
H*(k) =H(—b). (8.13) 


8.4 THE FIELD VARIABLES IN K-SPACE 


Just as Maxwell’s equations couple E and H in real space, Eqs. (8.8) through 
(8.11) couple & and W in K-space. In real space we were able to decouple 
Maxwell’s equations and obtain wave equations. We now decouple Eqs. 
(8.8) through (8.11) in K-space. Starting with Eq. (8.8), we vector multiply 
both sides by k, yielding 

ikx(kxé) = -uk xt”. 


214 THE MOMENTUM SPACE REPRESENTATION 8.4 


Expanding the left-hand side using vector identity 2 and taking & through the 
derivative on the right, we obtain 


i[k(k * &) — E(k: k)] = -p$ [k x2]. 


The first term on the left is zero by Eq. (8.10) and the right-hand side may be 
replaced by using Eq. (8.9). We then obtain 


gap EEE 
i Or 
or 
PCT E 
(= +k e) &(k) = 0, (8.14) 
where c = (eu)! ?. One can likewise show that # obeys 
e 
(= + ee H (k) = 0. (8.15) 


In each case we are interested in solutions f(k) of the equation 


e 
(+e) n. n =0, 


which may be factored to yield 


ê, LE 
(= $ ike) E = ike) fk 1) =0, (8.16) 
and we seek solutions f(k) which obey one of the following equations: 
2 
(= + ike) f t) =0, (8.17) 
ô 
(= — ike) f(k, t) = 0. (8.18) 


A general solution of Eq. (8.16) will then be a linear combination of indepen- 
dent solutions of Eqs. (8.17) and (8.18). 

Now, it is readily shown that if f(k, t) is a solution of either Eq. (8.17) or 
Eq. (8.18), the function f(k, —t) is a solution for the other. It is possible to 
set up a one-to-one correspondence between the solutions of these two 
equations in still another way. It is left to the exercises to show that if f(k, t) 
is a solution to either equation, its complex conjugate f*(k, 7) is a solution of 
the other. 

Since | —&| = |k| = k, we point out that if f(k) is a solution to either 


8.5 THE STATE FUNCTION IN K-SPACE 215 


equation f( —K) is also a solution to the same equation. In what follows we 
explicitly use the symbol f(k, t) to indicate a solution to Eq. (8.17). From 
the foregoing discussion symbols such as f*(k, t), f(k, —t), etc., denote 
solutions to Eq. (8.18). 

We may now write a general solution for &(k) as 


E(k) = Afi(k, f) + Bf*(—k, t), 


where A and B are independent of time but may be functions of k = |k]. 
We may impose the reality conditions, Eq. (8.12), on this form by setting 
A, = f» and B = A, giving 


E(k) = A[f(K) + f*(—k)]. (8.19) 
The reality condition which leads to the form of Eq. (8.19) also imposes a 
special form on &(k). By direct differentiation of Eq. (8.19), we obtain 
fll) | are) 
a ^ — et 


E(k) = A | 
Since f and f* are, respectively, solutions of Eqs. (8.17) and (8.18), we may 
use these equations to obtain 

E(k) = —ikcA[f(k) — f *(—Kk)]. (8.20) 
Combining Eqs. (8.19) and (8.20), we may solve for f(k) in terms of & and 


ó: 
fi) = = [40 — iw] (8.21) 
~ 2A ike i 
If we take the scalar product of this equation with k, we obtain 
k-f= : E gute 
2A ike | 
The first term on the right is zero by Eq. (8.10). The second term also vanishes, 
since, by Eq. (8.9), 
ks é— Lk (kx H) 
€ 
and k - (k x 2€) vanishes identically. We thus obtain an important condition 
on the function f(k) known as the tranversality condition: 


k*f(k) — 0. (8.22) 


8.5 THE STATE FUNCTION IN K-SPACE 


Culminating in Eqs. (8.19) and (8.20) we have related a function f(k, t) in 
momentum space with the Fourier transform &(k, t) and, hence, with the real 


216 THE MOMENTUM SPACE REPRESENTATION 8.5 


electric field E(r, t). Furthermore, we have done this in such a way that the 
reality condition on the field is automatically satisfied. Let us reconsider 
Eq. (8.17): 


Ff 
a7 ike. (8.23) 


We see that f(k, t) is a candidate for the state function in k-space, since Eq. 
(8.23) gives a direct relation between fand its first time derivative. To complete 
this identification we multiply both sides of Eq. (8.23) by i/i and obtain 


a, Of 
ih — = hkef. (8.24) 
ct 
Recalling the energy eigenvalue equation, 


é 
ih—|S> = E|S), (8.25) 
ot 


and the fact that for the photon the energy is given by E = khc, the identi- 
fication between Eq. (8.24) and Eq. (8.25) is complete. 

Before the function f(k) may be uniquely related to a particular real 
field E(r), it is necessary to evaluate the constant A in Eq. (8.19). From Eq. 
(8.24) the energy operator in real space i/i(@/ét) may be replaced in K-space 
by the multiplicative operator K/ic. Thus, the expectation value of the energy 
evaluated in the k-space representation should be given by 


4S|hikc| S) = | dkf*(khe)f, 


and this should correspond to the classical energy of the corresponding 
electromagnetic field obtained from Eq. (7.23) 


W=} | ateg- E+ ann. 
We thus demand that 
| dkf*(hke)f = 4 | dr[eE(r)+ E(r) + uH(r)- H(r)]. (8.26) 


Using Eq. (8.7), we insert the Fourier transforms into the classical energy 
integral and obtain 
1 


Y= ar” f " | à | dké(K)e^" - Í di'é&(kye** 


tu | dk Hl (k)e** - Í akeke] 


8.5 THE STATE FUNCTION IN K-SPACE 217 


or 

=; Qs =, || dkdk'[e6(k)  &(Kk^) + uf (K) HKEY x [e6*""]dr. 
From Appendix IV the integral over the spatial coordinates may be inter- 
preted as a delta function, leading to the form 

1 

T 2 Qn) 
After integration over the variable dk’, we obtain 


= | | dkdk' [ed (k) « E(k’) + UHK) - AC(K (2m) (Kk + k’). 


= 1 | KE: EH + uat HH] 
Now, from Eq. (8.9), 
&(k) -é(—k) = E (@ xxw) | . E (-« x aec) 
and, using vector identity 3, we obtain 
é(k) - &(—k) = a end) - H (—k). 
Thus, 
W = if LE é(— b + Hb: .é(— |; 


and, using Eqs. (8.19) and (8.20), we may write & and & in terms of f(k), 
yielding 


W=e | dk A*[f*(k) - f(k) + f*(—k) - f(—k)]. 


Making the change of dummy variable k — —k in the last integral, we can 
show that 


| dkA*f*(k) - fle) = | dkA?f*(—K) -f(—h, 
and, therefore, 
W = 2e Í dk A3f *(K) - f. 


It is seen that we can satisfy Eq. (8.26) by setting 


h 1/2 
ds (=) : (8.27) 


218 THE MOMENTUM SPACE REPRESENTATION 


Treating f(k) as the state function of the photon in momentum space 
leads to the correct correspondence between the classical energy and the 
expectation value of the energy when the normalization given by Eq. (8.27) 
is used. The normalization of Eq. (8.27) would be meaningless, however, 
if it did not lead to a correct correspondence between other classical pro- 
perties of the electromagnetic field and their expectation values in momentum 
space. In particular, we consider the expectation value of linear momentum. 

If f(k) is the state function of the photon, then 


f) * füo)dk 
should be the probability that the photon will be found in the volume element 


of k-space, dk, and possess linear momentum /ik. The expectation value for 
the i-th component of linear momentum should, therefore, be of the form 


<S|p|S> = | dkhk f *(k) - f). 


This expectation value must correspond to the linear momentum of the classi- 
cal field, which is given by 


m 5 Í deE(r) x Hr). (8.28) 


The integral in Eq. (8.28) may be transformed in a manner similar to that 
used for the energy integral. The details of the transformation are given in 
Appendix VI. The results of this transformation yield for each component 
pi, of the linear momentum p, 


] 
REG | dr[E x H]; 


= f dkf*(k) - hk;f(K). 


Thus, one can, with the normalization of Eq. (8.27), identify the linear momen- 
tum operator in K-space as hk. 

Let us summarize. We have seen that in momentum space we may 
define an acceptable state function f(k, t), as given by Eq. (8.21). This func- 
tion obeys a Schródinger equation and admits a probability interpretation 
for the bracket product. The energy and linear momentum operators for 
this state function have been identified. The remaining problem of angular 
momentum and its operators is taken up in the next chapter. 


(8.29) 


PROBLEMS 


8.1 Show that the motion in momentum space of a classical simple harmonic 
oscillator is an ellipse centered on the origin. Show that the area of the ellipse is 
proportional to the total energy of the oscillator. 


PROBLEMS 219 


8.2 Consider a very large number of classical particles, each of mass m, which 
move with constant velocities along the x-axis. At / = 0 those particles located by 
x-coordinates with X > x > — X and with velocities satisfying V Z- v z — V 
are represented in momentum space by points lying within a rectangle of area 
4mV X. Show that at some later time the area containing these points in momentum 
space has become distorted into a parallelogram but that the area containing them 
is still 4mV X. 


8.3 By transforming Maxwell's equations to momentum space obtain Eqs. 
(8.9), (8.10), and (8.11). 


8.4 If f(k, t) is a solution of either Eq. (8.17) or Eq. (8.18), show that f(—k, t) 
is a solution of the same equation and f *(k, f) is a solution of the other. 


8.5 Show that W(k, t) is given in terms of f(k, t) by 


H (k, th = NES ck X [f(k, t) — f*(—k, 1)]. 


8.6 If a particle of mass m which is described by a scalar state function (r) 
possesses a potential energy V(r) as well as kinetic energy, explain why you would 
expect l'(r) to obey a Schrödinger equation of the form 


8.7 Consider the plane wave electric field defined by 
E(r, f) = Ey cos (kz — ot)e;. 
For this field show by the inverse of Eq. (8.7) that 
E(k, t) = E(k.) Kk Ak — kJet + Ok + ke es, 
and, hence, 
f(k, t) = Eo[2A (kz) (ky) (Kk — Ke" er 


where A is given by Eq. (8.27). What difficulty is encountered in attempting to 
interpret these functions as single-photon functions ? 


8.8 Show that if the electric field is of the form 
E(r, t) = Ec(r) cos ot, 
the state function is of the form 
SK, 1) = folkje 14. 
Show that the converse is not necessarily true. 


8.9 It is often convenient to demand that a photon be found within some given 
finite (although perhaps large) volume of space. A simple example is a cubic box 
of side L, with one corner at the origin. As a formal mathematical device one may 
then stack these boxes side by side to fill all of space, with one photon in each box. 


220 THE MOMENTUM SPACE REPRESENTATION 
Since the electric and magnetic fields are identical at equivalent points in different 
boxes, these fields are periodic in space, that is, 


E(r) = E(r + L), 
where 
L = L(n;e; + me, + nÆ), 


with 71, ny, and n, being positive or negative integers or zero. Show that this con- 
dition implies 

E(r)e^^ — 1] = 0, 

Ekle" — 1] = 0, 


and, hence, that &(k) is zero except for those values of k of the form 
2r 
kn = se + mye, + m;e), 


where the m; are integers or zero. 


8.10 Proceeding from the results of the previous problem, show that if Eq. (8.7) 
is to remain meaningful, & must take on the form 


E(k, 1) = Ep (Dk — kn), 


leading to 


Er.) = SE p (nen, 


1 
(2 m3? 
where the sums are over all allowed values of km and Ex (I) is given by 


Qs? 
L3 


4,0 = Í drE(r, t)e7?**, 

where the integral is taken over the volume ofthe box of side L. Thus, the continuous 
function &(k, f) is replaced by the discrete set of functions [NO one function for 
each allowed value of km. Likewise, the continuous function f(k, 7) must be replaced 
by a set of discrete functions f, (7) given by 


meidan 
Sn (f) — $ kic Ex) — Kc Bp, (0) |. 


8.11 Let the state function f(k, f) represent a monochromatic superposition of the 
formf(k, t) = f(k)e- '^*. If we assume that the energy content of the electromagnetic 
field over all space is equal to the energy, hw, of one photon, show that Eq. (8.26) 
reduces to 


ho = lio | dkf*(k)f(k), 


and that the normalization fdkf*f = 1 of the state function in momentum space 
corresponds to one photon in real space. 


PROBLEMS 221 


8.12 Follow through the derivation leading from Eq. (8.26) to Eq. (8.27) in the 
case outlined in Problems 8.9 and 8.10, and show that in the monochromatic case, 
with 


6, (t) = Exe 19 


the assumption of one photon of energy fw within each box of side L leads to 
L 3 
fio = ho (=) Sem Sen 
2-7 m m 


and, hence, that the normalization 


2 3 
Efe, Siem = (=) 


in momentum space corresponds to one photon per box in real space. This pro- 
cedure, for obvious reasons, is known as box normalization. 


8.13 Consider the wave of Problem 8.7, 
E(r, t) = Eo cos (kz — ot)e;, 


from the viewpoint of box normalization, where k must now be of the form k — 
2«m|L and as a vector is given by k = 27m/Le,. Show that there are only two 
nonzero ó (t), given by 


Ege = (27)3!2e* Tet 
and, hence, only two nonzero fy, (¢): 
E ho _. 
fan = — (24)82 E eg? iet, 
2 2e 


Show that box normalization leads to 


Fo 
E= ADS 


CHAPTER 9 


ANGULAR MOMENTUM 


Following the introduction of the photon in Chapter 2, its particle nature 
was justified by the absorption in a detector of discrete bundles of energy 
within a very small region of space and within a very short interval of time. 
Intimately associated with this absorption of energy were other measurable 
quantities. In particular, linear and angular momentum have been discussed. 
Of these, angular momentum, within the Stokes parameter formalism, was 
used to introduce the entire scheme of quantum mechanics. 

Perhaps it has been noticed that since that point the topic of angular 
momentum has been scrupulously avoided. In neither the electromagnetic 
representation of the classical field nor the quantum description in momen- 
tum space was it discussed, although both energy and linear momenta were 
considered. The reason is that angular momentum cannot be adequately 
considered until both the classical and quantum representations are available. 
In this chapter we shall examine the angular momentum of light from both 
viewpoints. 


9.1 ANGULAR MOMENTUM OF THE ELECTROMAGNETIC FIELD 


To repeat what has been given before, if a classical particle possesses linear 
momentum p, then, in respect to an origin from which the position r is 
measured, its angular momentum is given by r X p. Thus, if a physical 
entity, e.g., the electromagnetic field, possesses a linear momentum density g, 
the angular momentum density, computed in respect to the origin, would be 
r X g. The total angular momentum of the electromagnetic field is then 


j=[arxe. (9.1) 
From Eq. (7.25) this can be written 
EX H 
r= [der x ( x, ) (9.2) 


222 


9.1 ANGULAR MOMENTUM OF THE ELECTROMAGNETIC FIELD 223 


For the plane wave states which were considered in Chapter 7, E and H 
were mutually perpendicular and their cross product was parallel to the 
direction of propagation k. Thus, the linear momentum density for such 
waves is also in the direction of propagation. It follows that for such waves 
r X g must be perpendicular to the direction of propagation, and the total 
angular momentum of a plane wave can have no component of angular 
momentum parallel to the direction of propagation. 

This leads to an apparent contradiction in our development. We know 
that circularly polarized states carry angular momentum whose direction 
is along the direction of propagation. And yet, in Chapter 7 we represented 
these states by superpositions of plane waves which cannot carry angular 
momentum whose direction is along the direction of propagation. Obviously, 
there is a contradiction. 

One way out of this dilemma is to realize that perfect plane waves which 
fill all of space do not exist. Hence, we should not worry about that which 
cannot be observed. This reasoning has some merit, perhaps, but it must be 
remembered that there is nothing within the classical framework which 
prohibits us from approximating a plane wave as closely as we wish. We 
proceed to examine the angular momentum of approximately plane waves. 

Recall that the plane wave representation of right and left circularly 
polarized light is given in terms of the electric intensity vector by 


E = Es[e* e + gine, Jgtecun. (9.3) 


Since for plane waves k x E oc H and V|E| = V/u|H|, the magnetic 
vector for this wave is given by 
H= H,[—e**4e, + erate jottke- m0 
l / c 
= Fi [—E. (9.4) 
N 
Ey and Ho in Eqs. (9.3) and (9.4) are constants. This wave propagates in 
the z-direction and uniformly fills all of space. We will now adjust this solu- 
tion so that E and H are nonzero only within some finite, although perhaps 
large, region around the z-axis. We accomplish this by allowing E, to be 
a function of x and y but not of z. Furthermore, for simplicity we assume 
symmetry about the z-axis so that the dependence is only a function of the 
distance from the z-axis, r = Vx? + y?. We also assume that the function is 
slowly varying in r, in a sense to be made more precise, and goes to zero 
outside some finite cylinder of radius Ro about the z-axis. We thus have 


E = E(x, ple? te; + e* 7e, ]eitz- ob, (9.5) 


H= Fi /: E. (9.6) 


4l 


224 ANGULAR MOMENTUM 9.1 


We would now have a realistic model of a light beam of finite cross section 
if it were not for the fact that the function defined by Eqs. (9.5) and (9.6) 
do not satisfy Maxwell's equations. To see this we recall Eq. (7.11): 


GE 
Vx H=. 
et 


Using the time dependence of Eq. (9.5) and replacing H by E through Eq. 
(9.6), we obtain 


Vx E= kE. (9.7) 


From Eq. (9.5), E, on the right-hand side of Eq. (9.7), has no z-component. 
By direct computation we find the z-component of the left-hand side of 
Eq. (9.7) to be non-zero and given by 


GE, ja OF | 
V E = Ainja C9 2. pgtinl& ^ €" idez- ot) E 
(Vx B. = [eio 2 e a (9.8) 
We thus see that the fields given by Eqs. (9.5) and (9.6) do not satisfy Max- 
well's equations because they lack z-components. We attempt a second 
approximate solution by explicitly including nonzero components £, and 
H.. It is first convenient to rewrite Eqs. (9.5) and (9.6) in cylindrical coor- 
dinates, as shown in Fig. 9.1. The direction of propagation is kept along the 
z-axis. The unit vectors e, and e; are shown in Fig. 9.2, and it is easily shown 
that they are related to the Cartesian unit vectors e, and e, through 
e, = cos Je, — sin bee, 


; (9.9) 
€, = sin be, + cos eg. 


The combinations 


et ing, + etie | pes e* Ve, + ie,] 


Fig. 9.1. System of cylindrical coordinates. 


9.1 ANGULAR MOMENTUM OF THE ELECTROMAGNETIC FIELD 225 


then become 


e* "Te. + je,] = e*'"/4((cos 0 + i sin Oe, + i(cos 0 + i sin 6)ej) 


er , (9.10) 
m e* "Ae, + ieget". 

Our second approximate solution is then written 
E' = [Eq(r)(e, + ieo) + f(r)e;e" (9.11) 
H = zi j$ E, (9.12) 

u 
where 

$ = (kz — ot + 0 F 7/4). (9.13) 


Because of the proportionality between E' and H' given by Eq. (9.12) and the 
form of the time dependence, Maxwell's equations reduce to the two inde- 
pendent equations: 


VxE = E, (9.14) 
V.E =0. (9.15) 


We impose Eq. (9.15) first. Writing the divergence in cylindrical coordinates, 
we obtain 


18 : 10 [2 
vau lS 12 gy 9.16 
r ôr FE] + r 00 Bat éz E. $m 


where, by Eq. (9.11), 
E; = Bole", 
E; = +iE,(r)e’*, (9.17 
E; = fie’. 


e 


ex 


x 


Fig. 9.2. The relation between the unit vectors of cylindrical and Cartesian 
coordinates. 


226 ANGULAR MOMENTUM 9.1 


Performing the differentiations, one finds that Eq. (9.15) will be satisfied 
if and only if 
ðE, 
ES) + ikf(r) = 0. (9.18) 
er 
Thus, the wave will have nonvanishing components in the direction of pro- 
pagation, i.e. f(r) #0, only in those regions where Ej(r) is not constant. 
Returning to Eq. (9.14), we write out the left-hand side in cylindrical 
coordinates below: 
1 8E] J à (= E, 
r 60 az} ^ Vez ar) ^ 


vxz-( 


1 


aE, 
: aes (9.19) 


16 E 
«(zem - 
r or 
Making use of Eq. (9.17), we obtain 


(V x E), = ckE; (1 + zz) 


: | 
kE) er) 


(V x E^, = kE; ( + 
(V x E), = ckE;. 


Thus, the z-component of Eq. (9.14) is satisfied identically but there are 
errors in the r and 0 components, given by 


i df 


i 
zn). kE, ap 


krE, 


, 


which, by use of Eq. (9.18), may be written 


1 8E, 1 8E, 
rk*E, Or’ — K*E, or? 


T (9.20) 

E,(r) was specified as a slowly varying function. We now make this 
more precise. By slowly varying we mean that the derivatives of Ej(r) 
are so small that the error terms above are very small compared to 1 and may 
be neglected. Examples of acceptable functional forms for Eo will be given 
later. Thus, E' and H' are a second approximation to a solution of Max- 
well's equations which represent a circularly polarized wave contained within 
a finite cylinder about the z-axis. 

In Fig. 9.3 we plot an acceptable function E(x, y) = Eo(r). We have 
explicitly made E, constant over a large central region of the wave and con- 
fined the variation of the function from this constant value to zero to lie 
within a **skin" of thickness 6 which lies a distance Ry from the axis. From 


9.1 ANGULAR MOMENTUM OF THE ELECTROMAGNETIC FIELD 221 


Fig. 9.3. The clectric field amplitude and the angular momentum density across a 
cylindrical beam. 


Eq. (9.18) the electric and magnetic fields can have a nonzero z-component 
only within the skin region of this wave. Having z-components within this 
region implies the possibility of a nonzero z-component of angular momen- 
tum within this region. Since the wave is identically zero outside the skin 
and constant inside the skin region, the skin region is the only one in which 
the z-component of angular momentum does not vanish. 

Without specifying the exact form of Ep(r), it is, of course, impossible 
to calculate the total angular momentum of the wave. However, it is pos- 
sible to calculate the ratio of the total z-component of angular momentum, 
per unit length, of this wave to the total energy per unit length. 

This calculation is most readily carried out in terms of the real compo- 
nents of the fields. These may easily be obtained by referring to Eqs. (9.11), 
(9.12), and (9.18), and are listed below. E,(r) is assumed to be a real function. 


Field component Real part of component 
E; E, cos à 
E; FE, sing 
: 18E, . 
E. — un (9.21) 
H; +E / - sin $ 
NH 
Hj E, / - cos $ 
Hu 
18E, fe 
H cet M d $ 
P k or n 


228 ANGULAR MOMENTUM 9.1 


Written in cylindrical coordinates, the z-component of angular momentum, 
as given by Eq. (9.2), is 


1 
J: = HI — [EH; — E; H]r?drdOdz. 
e 
When the real parts of the fields are inserted, this becomes 
I é 
J,= FA Hy — (eE?)r?drd6dz. 
o or 
If the integration over r is performed by parts, we obtain 


— (heE2)r2dr = leE2r? 
Í ay (Re or?°dr = ieE2r ; 


— i eEzrdr. 
0 


The first term vanishes at r = 0 and E,(r) is assumed to go to zero for r 
larger than Ry. We then have 


I 
J:=+— HI eE2rdrd6dz, (9.22) 
w 


where the plus and minus refer to right and left circular states. In a com- 
pletely analogous manner the energy content of the field is computed by 
use of Eq. (7.23) and the real field components from Eqs. (9.21) and is found 


to be 
1 1 /ĉE\? 
= 2 = —— - 
Ww [ff e [i + 5 gape EE (9.23) 


Just as the error terms given by Eq. (9.20) were assumed to be small enough 
compared to unity to be ignored, so we also ignore the term 


1 1 ( =) 

2 kE? \ ar)” 
assuming that it is also small compared to unity. A specific example in which 
both of these approximations are obviously justified is given in the problems. 


Finally, the ratio of the energy content of the wave to the component of 
angular momentum along the direction of propagation is 


Ww 
T sto, (9.24) 


which is consistent with the quantum viewpoint of photons each carrying 
energy fiw and angular momentum +f. 


9.1 ANGULAR MOMENTUM OF THE ELECTROMAGNETIC FIELD 229 


Perhaps there now appears to be no dilemma, for, as we have seen, when 
a more realistic approximation to a plane circularly polarized wave is used, 
we find the correct angular momentum content. However, let us plot the 
angular momentum density predicted by our results, rather than consider 
the total angular momentum content of the wave. As we have mentioned, the 
z-component of angular momentum can be nonzero only in those regions 
in which the fields themselves have nonzero z-components. Once again 
referring to Eq. (9.18), we see that the field can have nonzero z-components 
only in those regions where Eo(r) varies. Thus, the total angular momentum 
content of the wave lies within the skin region where Eo is nonconstant. 

Now consider the following thought experiment. A small circular absorb- 
ing disk, which is free to rotate about the z-axis, is placed in the center of the 
beam. The radius, ro, of the disk is indicated in Fig. 9.3 with ro «€ Ro. 
Assuming that the beam is a circular polarization state, should we expect 
the small disk to rotate? There are two possible answers: "yes" and "no". 
No, it will not rotate, for although it absorbs energy from the beam, the 
classical angular momentum density is zero in the central region and, hence, 
no angular momentum can be taken up by the disk. Yes, it will rotate, 
for each photon in the beam is in the same circular state and carries angular 
momentum in the direction of propagation, plus or minus fi as the beam 
is right or left circularly polarized. Once again there seems to be a 
contradiction. 

Let us show that there is no contradiction in expecting the disk to 
rotate. By placing the small disk in the beam we are attempting to measure, 
or observe, some property of the beam. In this case the property is angular 
momentum. In any measurement there must be an interaction between 
the measuring instrument and the observed system. In many cases this 
interaction is small enough to be ignored but in this case it has a gross 
effect, for we are removing from the beam the central section which is ab- 
sorbed by the disk. This is indicated in Fig. 9.4, where we have plotted the 


Fig. 9.4. The intensity and the angular momentum density across a cylindrical 
beam after an absorbing disk has been placed in the center of the beam. 


230 ANGULAR MOMENTUM 9.1 


intensity across the beam just behind the disk. The variation of the fields 
at radius ro behind the disk must be much too rapid to apply quantitatively 
the previous results for approximate plane waves. However, we can attempt 
to apply them in a qualitative manner. Because the fields are varying as a 
function of r in this region, Eq. (9.18) tells us that we can expect to find 
components in this region along the direction of propagation. If the fields 
have nonzero components in the z-direction, we expect to find a nonzero 
component of angular momentum along the direction of propagation. In 
the region around ry the fields are increasing as a function of r, whereas at 
Ro they are decreasing. Thus, the derivatives of these functions with respect 
to r will have opposite signs in these two regions. It follows from Eq. 
(9.18) that the z-components of the fields and, hence, the z-component of the 
angular momentum density will also have opposite signs in these two regions. 
This is indicated in Fig. 9.4. 

Thus, the inner skin of the wave has picked up an angular momentum 
opposite to that of the outer skin. In order to have total angular momentum 
conserved, the disk which absorbed this inner part of the wave must rotate with 
angular momentum opposite to that of the inner skin or in the direction of 
the incident beam's angular momentum. 

Several important points result from our discussion. First, and perhaps 
most interesting, is that a classical quantity associated with the electro- 
magnetic field does not necessarily indicate the value of that quantity which 
will be measured. This is clear from our example. The angular momentum 
density of the wave was zero at the center, yet when we attempted to measure 
it there, the classical fields adjusted themselves and produced a nonzero 
measurement which must be in accord with the quantum result W/J, = +o. 
In the preceding thought experiment we have neglected the niceties of 
diffraction theory. This was done simply because the exact diffraction prob- 
lem is of great complexity. However, in every classical calculation on 
realistic fields, the ratio of energy content to angular momentum content for 
the entire wave is consistent with the quantum relation E = fio. 

The second point follows directly from the last statement. The classical 
field concept and the quantum viewpoint give consistent results if we consider 
large volumes of the classical fields. The classical field does not include the 
concept of the photon as a localizable particle. Hence, apparent contra- 
dictions arise when we compare these theories within very small volumes or at 
particular points in space. For this reason a discription of angular momen- 
tum in K-space proves fruitful. In K-space we suppress the explicit space 
dependence of the state function. 

Before discussing the angular momentum of the photon in momentum 
space, we wish to examine in the following sections some fundamental aspects 
of angular momentum which are found in all systems whatever the nature 
of their state function might be. 


9.2 THE ANGULAR MOMENTUM OPERATORS 231 


9.2 THE ANGULAR MOMENTUM OPERATORS 


The role of angular momentum in quantum mechanics is of such importance 
that before studying the angular momentum of the photon we shall present 
some very general results for the angular momentum of any system. 

Quantum mechanics associates with each observable an operator whose 
eigenvalues yield the measurable values of the observable. We begin by 
examining the angular momentum operator associated with a particle which 
possesses a scalar state function. Classically, as we have seen, the angular 
momentum, L, of a point particle is given in terms of its linear momentum, 
p. by 

L-rXp, 


where r is the location of the particle, as measured from some particular 
origin. This may be translated into the language of quantum mechanics by 
replacing p by the operator pop. = —ihV, and we obtain 


Lop. = —ihr x V. (9.25) 


This, of course, represents three operators, one for each component of angu- 
lar momentum. In Cartesian coordinates these operators are given by 


[8 8 
L,= —ih oz-:z) 


[7] a) 
= —ih(z— —x—}, 9.26 
Ly in( c» x =| (9.26) 
ô ô 
2 e h "ae i£) 
Bee (x ay > ox 


In addition to the component operators, we may also define an operator 
for the magnitude of the total angular momentum by 


L?=L-L. (9.27) 


One may, of course, set up eigenvalue equations for these operators im- 
mediately. However, by so doing a great deal of generality is lost. Consider 
the commutation relations which are obeyed by the four operators given 
above. It is left to the problems to show that 


[Lo Ly] = iLa [Ly Le] = iL. 
[Lz L;] = ihL,, (9.28) 
(^, L] = (L*, Ly] = [L*, L.] = 0. 
Now, it happens that Eqs. (9.28) are much more general than Eqs. (9.26). 


By this is meant that there exist operators which satisfy the latter equations 
which cannot be written in the form of Eqs. (9.26). In fact, we shall see that 


232 ANGULAR MOMENTUM 9.2 


the angular momentum operators for the photon satisfy Eqs. (9.28), yet 
we have seen that the photon does not possess a scalar state function and, 
hence, the operators of Eqs. (9.26) do not apply to the photon. To distin- 
guish from the particular form of Eq. (9.26), we therefore define general 
Hermitian angular momentum operators by their commutation rules: 


li, J] = ifhJ,, [J?, Ji] = 0, J, = Jj. (9.29) 


In the first of these equations, i, j, and k are assumed to be in cyclic order. 
Since we imply no particular representation for these operators, the operator 
J? must be interpreted to mean 


P= P42 +P, (9.30) 


The following section is devoted to exhibiting the eigenvalues of these 
operators. This exhibition is largely an algebraic tour de force. Thus, let us 
postpone the algebra and examine the physics which is involved. 

Equation (9.29) tells us that the operators associated with the various 
components of angular momentum do not commute among themselves. 
Referring to the results of Section 5.2, we may not, therefore, define states 
which are simultaneously eigenstates of two such components. Physically 
we may not simultaneously measure two different components of angular 
momentum with arbitrary accuracy. However, Eq. (9.29) also tells us that 
the operators associated with the individual components commute with the 
operator associated with the total angular momentum, J?. Thus, for any 
system we may prepare states which are simultaneously pure states in 
respect to total angular momentum and the component of angular momen- 
tum along some one direction. Without loss of generality we may take this 
direction to be the z-direction. Then, as we shall see in the next section, 
Eqs. (9.29) lead directly to the eigenvalue equations 


PIS) — jj + Ih? S>, (9.31) 
J,|S> = mjh|S5, (9.32) 
with j taking on integer or half-integer values, j = 1, 1, 14, 2,. . ., and fora 


given value of j, m; may take on only the (2j + 1)-values 
m; -45j— laj = 2; teag —jt+ l, m 

In conclusion, the quantum mechanical specification of the total angular 
momentum of a system is given by the two "quantum numbers" j and m;, 
which may be integers or half-integers. The term "quantum number" is 
used for any convenient label for the eigenstates of a system. For example, 
one may refer to a system as having angular momentum j — 1, meaning 
that according to the eigenvalues of Eq. (9.31) the magnitude of the angular 
momentum of the system is 


VIG + D? = V 2h. 


9.3 ANGULAR MOMENTUM EIGENVALUES 233 


9.3 ANGULAR MOMENTUM EIGENVALUES 


We now demonstrate that the commutation rules given by Eqs. (9.29) are 
sufficient to determine the eigenvalues of the operators J? and J., as given 
in Eqs. (9.31) and (9.32). 

We write the eigenstates of these operators as |A, m>, where the quantum 
numbers or labels of these states are related to the eigenvalues of the operators 
as shown below: 


J?|A, m» = Ah?|A, m, (9.33) 
J4|A, m» = mh|A, m». (9.34) 


We can assume a normalization of these states such that 
<A, må, m» = 1. 


We first show that 22> m?. Let |S> be any eigenstate |, m>. Using Eq. 
(9.30), we may write 


«Sys» = (S|d2|S> + <S|J2|S> + «SUIS». 
which, by Eqs. (9.33) and (9.34), becomes 
2? -0UMIS» + CUJUS» + f?m?, 


where we have made explicit use of the Hermiticity of the operators to write 
(S|J2|S> = <S|JT,|S>. 


The remaining bracket products must be real positive numbers, since they 
are of the form <S’|S’> with |S’> = J,|S> or |S'» = J,|S>. Itis thus apparent 
that 4? > m?. 

We next show that, given an acceptable eigenvalue m,, we may create a 
sequence of acceptable eigenvalues whose differences are integers. The proof 
requires the introduction of two new operators, which are defined by 


pp i 
It is easy to show that these operators satisfy the commutation relations 
(J, J] — 0, (9.36) 
[/, J.] = XEWJ., (9.37) 
and obey the identities 
J, =J}, (9.38) 
JJ. =J? —J2 + MJ (9.39) 


JJ, =P — J? — hd, (9.40) 


234 ANGULAR MOMENTUM 9.3 


If one writes out the commutation relation Eq. (9.37) in full, one obtains 
JJ .|ÀA, m» — J4J.|A, m» = AJ |2, m» 
or 
JA .|À, m») = him x DU |2, m»). 


Thus, the kets J , |å, m> are eigenvectors of J, whose eigenvalues are fi(m + 1). 
Therefore, the kets J.|A, m> can differ from the eigenstates |A, m + 1» by 
at most a constant of proportionality. Thus, if we have any eigenvector, 
we may create a sequence of eigenvectors 


|å, m», |2, m + 1>, | m 2»... 
with increasing values of m, by successive application of the operator J,., 
and a sequence with decreasing values of m, 

|å, m>, |2, m — 1>, |2, m — Dp.. 
by application of J_. However, this cannot be carried on indefinitely or the 
condition 2? => m? would be violated. Thus, there must exist both a maxi- 


mum and minimum m that can be reached. Let us call these maximum and 
minimum values of m, j, and j’. For these states we must have 


J4Aj-9 |% j> #0, (9.41) 
L|Aj»-9  |Aj»20 (9.42) 


or the sequence would not terminate. We therefore have a sequence of 
eigenstates |A, m» where m varies in integral steps from j’ to j, with the dif- 
ference between any two values of m being an integer. The difference between 
jand j’ is therefore integral. 

We now show that the sum of j and j’ as well as their difference is integral. 
By applying J- to Eq. (9.41) and J, to Eq. (9.42) and using Eqs. (9.39) 
and (9.40), we obtain 


JJl j> 20 = JI? — jR — jh’, (9.43) 
JJ- j> 202 PR —j?n? + jh. (9.44) 
Solving Eq. (9.43) yields 
P =j + 1), (9.45) 
and, subtracting Eq. (9.43) from (9.44), we find 
jt =0, 


which is, of course, an integer. Thus, both the sum and difference of j and 
j are integers. This is possible if and only if j and j' are themselves either 
integers or half-integers. 


9.4 ORBITAL ANGULAR MOMENTUM 235 


Therefore, the allowed eigenvalues of the operator J? are 
KIG + 1), 
where j is integral or half-integral and for each value of j the allowed values 
of m are the (2j + 1)-values 


bj—hj—-2, P t1- 


9.4 ORBITAL ANGULAR MOMENTUM 


In Chapter 2 the distinction was made between orbital angular momentum 
and intrinsic or spin angular momentum. In this section we examine the 
manner in which the orbital angular momentum operator transforms to a 
K-space representation. 

As we have seen, it is natural to write the orbital angular momentum 
operators, for a particle which possesses a scalar state function, as 


Lo. — ihr X Vp, 
where 
ô ô é 
Ve = e, — — —: 9. 
4 uc HD teg PM 


The subscript, R, on the gradient indicates nothing new but is used here to 
distinguish this operator from 
y ô g ô + [2 
= ex > — —> 
FUR oe. € ok, EDk 

which may be thought of as a gradient in momentum space. We shall make 
use of the operators V, later. 

Assuming that a particle has a scalar state function y(r), the expectation 
value of its orbital angular momentum would be given by 


(9.47) 


<L> = —ih | dry*()Ir x Vily). (9.48) 


We obtain a description of the particle in k-space by writing y(r) in terms of 
its Fourier transform: 


1 , 
Wn = aus Í dke^"$K), (9.49) 
where $(K) is given by 


1 ; 
$00 = cy | dre yr) 


236 ANGULAR MOMENTUM 9.4 
Placing Eq. (9.49) in cm (9.48), we obtain 
<L> E On? ={{ | dkdk’ drp*(k')e~"* [r x Vald(Ae*". — (9.50) 


Using the definitions of Eqs. (9.46) and (9.47), one may verify directly that 
ely x V,]e'*' = etik x Vielen 


Thus, Eq. (9.50) becomes 


a» = 


üs y = [f] dkdk'dró*(k'(K)(k x Vie 


Making use of the fact that the expectation value must be real and 
(Ly = <L>*, we may rewrite this as 


(D = 


= y -TI dkdk'$*(K)g(k^(k X Vx ) | are kyr 


= ih | dkp*(k)k x | dk'$(k')V x(k’ — k). 


The integral involving the delta function is simply the three-dimensional 
generalization of Eq. (IV.11) of Appendix IV and yields 


<L> = —ih | dk$*(K)(k x Vx)b(k). 


If, then, (A) is taken as the state function of the particle in momentum space, 
the orbital angular momentum operator in momentum space is seen to be 


Lop. —ih[k X Vx], 


which, of course, represents three operators, one for each component. For 
example, 


L ih | k ? k d 

= —iIh — —k.— |: 

* "Ok, — ^6k, 

It is interesting to note that these operators obey the general commutation 
rules for angular momentum operators: 


[Li Lj] = ihL,, 
[L’, Li] = 0. 
Finally, the operators discussed in this section do correspond to our 
ideas of orbital angular momentum. The quantum mechanical “particle” 
which is described by a simple scalar function of position and time corres- 


ponds to the classical point particle. The quantity y*ydr tells us the proba- 
bility of finding or not finding the particle within the volume element dr. 


9.5 THE ANGULAR MOMENTUM OF THE PHOTON 237 


There is no information contained in y(r) about any possible internal struc- 
ture of the particle. The operator —ihV, yields information about the 
linear momentum of this simple particle. The operators —ihr X V, and 
—ihk X V, specifically refer to the angular momentum of the particle in 
respect to a given origin. 

This is not to say that the classical expression for angular momentum, 
L — r X p, generalizes to nothing but orbital angular momentum operators 
in quantum mechanics. When the classical momentum, p, has a greater 
complexity than that of a simple particle, e.g., in the electromagnetic field, 
we shall find that the angular momentum operators contain more than orbital 
terms. 


9.5 THE ANGULAR MOMENTUM OF THE PHOTON 


In Chapter 8 it was seen that an acceptable state function, f(k), could be 
found for the photon in momentum space. We now wish to find the proper 
and correct angular momentum operators which can be associated with the 
momentum space representation of the electromagnetic field. We begin 
with the classical angular momentum of the electromagnetic field. 

The linear momentum density of the field is given by 


g= 4 Eo) x H(r). 
We may associate with this an angular momentum density of the field 
E x (Ex H), 
and for the angular momentum of the total field we have 
a5 drx (E x H). 


Proceeding along the lines of Chapter 8, we write E and H in terms of their 
Fourier transforms, &(k) and H (k). &(k) and H (k) are then expressed in 
terms of the state function, f(k). After a great deal of manipulation, which 
is given in Appendix VI, one obtains for the a-component of J: 


Gas Y i dk fF[—ihk X Vy],f, + | dk[—ihf* X f], (9.51) 
pHi 


The first term may be interpreted in terms of the orbital angular momentum 
operator in K-space and is in the form of an expectation value. Here the 
expectation value includes both a summation over the discrete components 
of the state function and an integration over the continuous variable k. The 
summation over the components indicates that each component of the state 
function can contribute to each component of the orbital angular momentum. 


238 ANGULAR MOMENTUM 9.5 


As an example of orbital momentum, we might consider the approximate 
plane wave of Section 9.1 displaced in such a way that it does not pass through 
the origin. It would then possess a nonzero orbital momentum about the 
origin in much the same way as the ball discussed in Section 2.7. 

The second integral in Eq. (9.51) is something new and is not in the form 
of a bracket product or expectation value. Let us show that it can easily 
be cast in this form. 

Consider any two vectors, whose components may be complex, which 
we call 4 and B. By the ordinary rules of vector algebra we may compute 
the components of the vector cross product A* X B. Now, if we think of 
these vectors as kets | 4» and |B>, A* x B may be written as an operator 
bracket product. 

We introduce the three matrices &;, given by 


0 0 0| 
Xj2| 0 0 1| 
L0 -1 al 
ro o n 
X,2| 0 0 0, (9.52) 
|-1 0 of 
ro 1 0| 
X,2|-1 0 0. 
| 0 o ol 


By direct computation, one may show that the x-component of the cross 
product A* x B is correctly given by 


(4* X B), = (A 


X, 


B». 
If we further define a set of matrices, S,, by 
Sa = —ihd,, (9.53) 


the second integral in Eq. (9.51) may be written 
| eat-in x fe = ASP (9.54) 


where |f» is the ket representation of the vector f(k) and the bracket product 
is understood to include an integration over the continuous variable k 
as well as the usual matrix multiplication. 

It is the term given in Eq. (9.54) which represents the intrinsic or spin 
angular momentum of the photon. If we define the matrix 


10 0 
s-xeses-w loro (9.55) 
0 0 1 


9.5 THE ANGULAR MOMENTUM OF THE PHOTON 239 


we find that the set of matrices S,, S,, S}, and S? is Hermitian and obeys 
the usual commutation rules for angular momentum operators: 


[S Sj] = iiS, 

[S;, S°] = 0. 
Once again, these commutation relations allow us to find simultaneous 
eigenstates of the angular momentum operator S? and any one component. 


As before, we pick the z-component. 
Since the operator S? is proportional to the identity matrix, any ket 


lo = t 


is an eigenstate of it, with eigenvalue 24°. Thus, by Eq. (9.31), the quantum 
number for the spin angular momentum of the photon must be j= 1. 
By Eq. (9.32) the projection of this angular momentum along the z-axis 
may take on the values +4, 0, —7i. The last result could also be obtained by 
solving the eigenvalue equation 


S 


2 -ml|. 


Explicit solution of this equation also yields the eigenvectors associated 
with these states: 


(0) 1 eT ind 
Ixo = {0 lx» = — f e+], (9.56) 
1 v2\ 9 


The states |xo», |%+>, and |.» are, respectively, associated with the 
eigenvalues 0, +f, and —/i. These eigenvectors have been normalized to 
satisfy 


Oulzo = s. 


The states |7,» come as something of a happy surprise. They are simply 
the ket vectors for right and left circularly polarized states when k is along 
the z-direction, with a third component, which has been made explicitly 
zero. The appearance of a third state, | yo», is unexpected. The appearance 
of the eigenstate |o» with eigenvalue zero would ordinarily indicate the 
existence of an observable state with zero projection of spin angular momen- 
tum along the z-direction. This state is not observed experimentally. Recall 
that all observable states must satisfy the transversality condition, Eq. 
(8.22). Now, for the waves which we have been considering, we have always 


240 ANGULAR MOMENTUM 9.6 


picked the direction of propagation to be the z-direction. If we continue 
this practice, then the vector k is of the form 


0 
k=|0] = |k, 
k 


and, in accordance with the transversality condition, we find 
<k| 42> = 0. 


However, «k|yo» is not zero. Thus, it is the transversality condition which 
forbids the experimental observation of the state [yo in this case. More 
will be said about this in the next section. 


9.6 THE PURE STATES OF THE FREE PHOTON 


The three eigenstates of the operator S.—|xo». |x+>, and |z-5—are three 
independent, orthonormal, three-dimensional vectors and, hence, form a 
basis in K-space. They may therefore be used to expand any vector in that 
space. In particular, if f(k) is any photon state function, it may be written 


Sk) =f 0|» + fo [xo^ + f£- 01x ^ (9.57) 


Now, the orbital angular momentum operators L = —ih[k x V,] and L? 
operate only on the coefficients fo(k), f. (k), and f_(k), and not on the unit 
vectors |o», | 4», and |y». The spin operators S; and S? operate only on 
these unit vectors. It follows that if one operates on a state function with 
both an orbital and a spin angular momentum operator, the order of opera- 
tion has no effect. Thus, the orbital and spin operators commute. Since 
these operators commute, it follows immediately that the total angular 
momentum operators, 


J, =L, + Sas 
J? = (L + SF, 


commute with both the spin and orbital angular momentum operators. 

We therefore have a set of six operators J?, J.,-L?, La, S°, and S, which 
mutually commute. It should follow then that one can find state functions 
of the photon which are simultaneously eigenstates of these six operators. 
Experimentally, this would mean that we could prepare pure photon states 
for which we could simultaneously measure the six associated physical quan- 
tities. This should be true regardless of which particular direction we have 
picked for the z-axis. 


As we have seen in the last section, the transversality condition does 
not allow this much generality. If k is not along the z-axis, k - f = 0 would 


PROBLEMS 241 


not be satisfied for either of the observable pure states of spin angular 
momentum, 


FQ) o 
f-(0)|z-». 


Thus, if the z-direction, along which we wish to measure the components of 
angular momentum, is not parallel to k, the state of the photon must at best 
be a state of the form 


[55 =f.) 44> +f) 


In this case it is only the total angular momentum J? and its component J, 
which may be measured, and the splitting of this momentum into spin and 
orbital parts is not well defined. 

We also note that the operators for linear momentum and angular 
momentum do not commute. Thus, we may not prepare photon state func- 
tions which are simultaneously pure states of linear momentum and total 
angular momentum. This is also seen from the function f(k) which determines 
the orbital angular momentum. This in general is a continuous function 
over al] values of k in momentum space. The energy operator, being a 
derivative in respect to time, does commute with both momentum operators, 
however. Thus, we may prepare pure photon states 


|S) = |o, J, J,> (9.58) 


X-» 


which are pure in respect to energy, total angular momentum, and its 
z-component. 
Alternatively, we may prepare states 


IS) = |o, k, «> (9.59) 


which are pure in respect to energy, linear momentum, and, of course, 
polarization. Either type of pure state may be expressed as a coherent super- 
position of the other type. 

Only in the special case where the z-axis is taken to be along the direction 
of propagation is there a simple relation between these two types of states. 
In this case the pure polarization states | R» and |L> are also pure states of 
spin angular momentum with S, = +A. 


PROBLEMS 


9.1 Verify the relations for the operators J, given by Eqs. (9.36) through (9.40). 


9.2 Verify that operators defined by Eq. (9.26) obey the commutation relations, 
Eq. (9.28). 


242 ANGULAR MOMENTUM 


9.3 Verify that the operators defined by Eq. (9.53) obey the commutation rela- 
tions, Eq. (9.29). 

9.4 By direct computation find the eigenvectors and eigenvalues of the matrices 
defined by Eq. (9.53). 


9.5 Prove that if the sum and difference of two real numbers are integers, the 
numbers themselves are integers or half-integers. 


9.6 Show that if Eo in Eq. (9.17) is allowed to be a function of r and 6, the 
z-component of Eq. (9.14) may still be satisfied identically although additional error 
terms are introduced into the r- and 6-components. 


9.7 In Chapter 5 the linear momentum operator was introduced by considering a 
small translation of the system. Consider a system which is described by a function 
f(x, y)- If this system is rotated through a small angle dð about the z-axis, show that 
the new function f'(x, y) can be obtained by a rotation operator 


f, y) = RG y) 
where 
R-I m L 
= ho? 
with L as given by Eq. (9.36). By the discussion of Chapter 5 give a condition for 
the constancy of the z-component of angular momentum of the system. 
9.8 Show that the commutation rules, Eq. (9.28), may be written in the compact 
form 
Lx L= iL. 
9.9 Show that in spherical coordinates (see Fig. 6.2) the angular momentum 
operators take on the forms 


L, 


6 é 
i (cot o cos ET + sin iz) 


ll 


e e 
Ly = ih (cot o sin $ $$ cos iz) 
€ 


L ic 
gat 049 


5 » NE FP. 1 8 

x sin 0 20 (sin z) * sio op] 

9.10 Verify that the following are eigenfunctions of the orbital angular momentum 
operators with j — 1: 


n NE inteso = [2 EXD, 
á 8T 87 r 


n »- /2 p= EE 
Š aq, 50 = Anr 


PROBLEMS 243 


9.11 For the eigenfunctions given in Problem 9.10 verify that 
L,|l,1»—0, L |l, 21» =0, 
and that L .|1, 0» is proportional to |1, +1. 
9.12 In terms of the matrices given by Eq. (9.52), we may define 
Dy = XQ, 
Show that 
Elz = E-|z->)=0 
and that E +| zo» is proportional to |z +>. 
9.13 In relation to the development given in Section 9.1, consider the function 


1 


eur] Ro- Rofr) + 1 i 


Er) = 


Show that if a is large, a = 100, the function £,(r) is essentially constant for 
rCRo and nearly equal to zero for r>Ro, and varies appreciably only in a region 
near Ro = r whose width is of the order of Ro/a. For 1 cm < Ro < 10 cm and 
2 = 5000 A show that the error terms 


1 9E, 1 E, 1 OE M? 
rK?E, êr?’ k?E Or?" KE? \ Or 


are of order (a/kRo)? or smaller and may be neglected. 


9.14 Consider a system for which the appropriate angular momentum operators 
J,, Jy, and J, are of the form 


h=2 / 5 AK, 
p ^ 3 is 
where the square of the K; operators, K?, is an identity operator. Prove that ifa 


matrix representation can be found for the operators J;, they must be 9 x 9- 
matrices or larger. 


APPENDIX I 


MATRIX ALGEBRA 


For those who may be unfamiliar with matrices, the mastery of the rudiments 
of matrix algebra presented here is sufficient for comprehending the mathe- 
matical proofs in the text and the solutions of problems. An elementary 
knowledge of the method of solving simultaneous linear equations is assumed 
on the part of the student. 


I1 MATRICES AND THEIR PROPERTIES 

Consider the set of linear equations 
Ji = 41X1 + 12X2 + 13X3, 
Ja = d21X1 + d53X2 + 23X33, (AL1) 
Va = Q31X1 + d32X2 + A33Xs, 


where the a;; are the coefficients of the unknown variables x; (i, j = 1, 2, 3). 
This set of equations may be written as the array 


Ji 3 i2 O35 || Xi 
J2|7|0231 A22 G23||Xa2|- (AL2) 
Ys 431 A32 daa | L Xs. 


Now, any array of numbers a; set out in m rows and n columns is called a 
matrix of order m by n and the order is generally indicated as m x n. In 
Eq. (AL2) the array of the a;; forms a 3 x 3-matrix (which is read three by 
three not three times three). The entire array of the a;; will be denoted by 
the symbol A. Each of the a; is called an element of the matrix A or simply 
a matrix element. A matrix may consist of a single column or a single row: 
the x, and y; for example, are the matrix elements of 3 x l-matrices. 
Hereafter we shall frequently refer to a 1 x n-matrix as a row vector and an 
n x 1-matrix as a column vector. Column vectors will be designated by the 
245 


246 MATRIX ALGEBRA I.1 


symbol | ) and row vectors by (|. This shorthand allows us to write the 
column vectors in Eq. (AI.2) as 


and Eqs. (AT.1) and (AI.2) can be succinctly written as 
|») = Ap. (AL3) 


in which the column vector |y) is interpreted as the result of the operation 
of the matrix A on the column vector |x). 

If Eqs. (AI.3) and (ALI) are to express the same relations among the 
variables, there must be some rule by which Eq. (AI.1) can be extracted from 
Eq. (AL2). That is, the matrix elements of A, the a; must multiply the 
elements x; to obtain the y;, as specified in Eq. (AI.1). It is easy to see that the 
matrix elements of the vector | y) are obtained from 


3 
yim Es AyXj, 
g-— 


and this prescribes how the matrix elements of A must multiply those of 
|x) for Eq. (AI.2) to convey the same information as Eq. (AI.1). 

The correspondence between Eqs. (AI.1) and (AI.2) has been indicated, 
but now some other properties of matrices must be specified before we can 
proceed further. 


a) Two matrices A and B are equal if and only if they are of the same order 
and if their corresponding matrix elements are equal: i.e., for any i 
and j, ai; = bi. 


b) The sum or difference of two matrices A and B (whose orders are equal) 
is defined as a matrix C such that the elements of C are given by 


Ci = ai x by. 


c) If A is any matrix and a is any scalar, then the matrix A’ = aA is that 
obtained by multiplying each matrix element of A by a. 


d) The matrix elements c;; of the product C — AB are defined by the 
relation cj; = > aby; This implies that the number of columns in A 


must equal the nütber of rows of B. 
e) In general, AB + BA; that is, matrix products do not commute. 
f) Matrix multiplication is associative: A(BC) — (AB)C. 
g) Matrix multiplication is distributive: A(B + C) = AB + AC. 


I2 DEFINITIONS 247 


As examples of matrix addition and multiplication the following are 


given: 
2 10 $* 23 
A=/3 5 4}, B=]1 —1 0, 
130 0 3 I 
$ 3 2 
A+B=]4 4 4|, 
1 6 1 


> 
[s] 
ll 


94+54+0 6—5412 64+0+4+4/=/14 13 10 
34+3+0 2-340 24040 6-1 2 


Bor 4—140 rota a 3 ] 

L2 DEFINITIONS 

a) A diagonal matrix is one in which the matrix elements a, = O for all 
ij. 

b) The unit matrix or identity matrix | is a diagonal matrix in which a;; = 1 


for all j. 


c) A matrix which has the same number of rows and columns is called a 
square matrix. 


d) A singular matrix is a square matrix whose determinant is zero. If the 
determinant is not zero, the matrix is non-singular. 


e) The matrix elements of A*, the complex conjugate of A, are the complex 
conjugates of the elements of A. 


f) If A = A*, A is a real matrix. 


g) The interchange of rows and columns of any matrix results in a new 


matrix A called the transpose of A. The elements d;, of the transpose are 
related to those of A by 4; = a. 


h) The matrix At = A* is called the ajdoint of A. 
i) A matrix is Hermitian if At = A. 

j) A matrix is unitary if AA = I. 

k) A real matrix is orthogonal if AA = I. 


1) The trace of a square matrix A is the sum of the diagonal elements of A 
and is written Tr (A). The trace of the product of two matrices is equal to 
the product of the traces, Tr (AB) = Tr (A) Tr (B). 


248 MATRIX ALGEBRA 1.3 


L3 THE INVERSE MATRIX 


We represented a set of linear equations in Eq. (AI.1). If the determinant of 
A (written as Det A or |A|) is nonzero, it is possible to solve for |x) in terms 
of |y). That is, 


|x) = BI»). (ALA) 


Given the matrix A, the problem is to determine B. The relation that A and 
B must satisfy can be obtained by substituting for |x) from Eq. (AI.4) into 
Eq. (AI.3): 


|y) = AB|y), 


which means that AB = I. 

We will show how B can be constructed for any non-singular matrix A. 
Moreover, it turns out that B is unique and is called the inverse of A, which 
is designated as B = A71. We will use the 3 x 3-matrix A as given in 
Eq. (AL2) as an example. The method presented can be extended to n X n- 
matrices but is quite tedious for n > 3. Recall that the cofactor of a matrix 
element a;; in A is (— 1) *? times the determinant of the matrix obtained when 
the i-th row and j-th column are deleted from A. Designating the cofactor of 
a; as |A|, the determinant of the matrix A can be written in terms of its 
cofactors as 


s 
|A| = > ai|A;;| for fixed i, 
j=l 
3 


|A| = X a,,|Ai,| for fixed j. 
121 


We now construct from A the matrix A, such that the cofactors of the ele- 
ments of the i-th row of A are the elements in the j-th column of A. That is, 


A [Anl] Aal |Asi| 
A= (As. [A422] [As] |- 
[Aia]. | Aza [Ass] 


Let us see what happens when we take the product AA. The matrix elements 
ci; of the product will be 


E 3 
(AA); = È au|A je| 


for fixed i. For i — j, 


(Aue X asta] = [Al 


L4 THE MATRIX REPRESENTATION OF VECTORS 249 


and 
(AA); = (AA)oo = (AA)s3 = |A]. 


If j + i, the sum involves the product of the elements of one row with the 
cofactors of another row, which is equivalent to the expansion of a deter- 
minant which has two identical rows. Consequently, 


id 3 
(AA); = P ai|A;,| = 0 for i ze j. 
(=1 


Therefore, the product AA is 
. fil o 0 
AA=(0 ja] 0) =JAIL 
0 © |A| 
Since AA~* = I, we can say that 
A^ = JA]. (AL5) 


The inverse of a matrix is unique, because if we have another matrix, C, 
such that AC = l, then 


C = IC =(AA)C = A-(AC) = AT 


Finally, Eq. (AI.5) is equally valid for real matrices or for matrices with 
complex elements. 

As an example of the above, the determinant of the matrix A given in 
Section AI.1 is —20 and the inverse of A is found to be 


-12 0 4 
A3-—-—120 4 o —8|. 


4 -5 7 


I.4 THE MATRIX REPRESENTATION OF VECTORS 
In ordinary vector notation any vector is generally expressed in the form 
A = Ae; + Ages + Ages, 


where e;, e>, and e; are unit basis vectors specifying the direction of the axes 
of a Cartesian coordinate system, and 4,, A, and Ag are the components 
of the vector 4 along these axes. As we know, the scalar or dot product of 
the vector A with the vector B = B,e; + Bye. + Bye; is 


A- B = A,B, + AB, + AB. 


The manner in which A and B have been written above is really a convention. 


250 MATRIX ALGEBRA L4 


So long as the order of the components is specified and maintained, the 
vectors A and B could both be written as column vectors 


Ay B, 
|A) = 4) |B) = 2) 
Az Bs 


The usual operations of addition of vectors and the multiplication of a 
vector by a scalar can be performed. By definition, 


a(A + B) = a(A, + Byes + a(As + Boley + a(Ag + Ba)es, 
which in our notation becomes 
aA, + 2) 


a[|A) + |B)] = [sa +B.) 
a(d + Bs) 


What of the scalar product? If we are to represent vectors by column 
vectors, the scalar product can be taken in the following way. Since |4) 
and |B) are column vectors, their transposes are given by 


(A| > (A5, Ag, As), (B| = (Bı, B3, Bs), 
and it is seen that the scalar product may be written as a matrix multiplication 


Bi 
A-B= (A|B) = (Aj, Az, As) H = A,B, + AsBs + AsBs. 
Bs 


where the multiplication (4|B) is the matrix product as defined in Section 
ALI. 

We have shown the column and row representation of the vectors 
A and B, but nothing was said about the representation of the basis vectors, 
e; If we write 


‘Ay l 0 0 
|A) = ( = A, (o) + Ag H + As (°} 
ds 0 0 1 


the column vector representation of the e; can be seen to be 


1 0 0 
le) = (o) le) = () leg) = o) 
0 0 1 


and |A) can be expressed as 


3 
|A) = à A, |e) = Alei) + A,|e2) + Aal|es). 


1.4 THE MATRIX REPRESENTATION OF VECTORS 
With this representation, the scalar product can also be written 
3 3 
(B|A) = | aye) ] | ^ie) = > BA«e;|e) 
j=1 i=1 i 
3 
= > AiBi, 

ia 

since 


- _ fl fori 2j, 
(ejje) = 545, 65 = m for i £j. 


251 


APPENDIX II 


JONES AND MUELLER MATRICES 


Certain matrices are tabulated in this appendix which appear within the 
context of the Jones and Mueller calculuses. The selection has been made 
on the basis of pertinence to the main text and is not meant to be complete. 


IL.1 TRANSFORMATION MATRICES 


Any two perpendicular directions x and y may be used to determine basis 
states |P,> and |P,> for the Jones and Mueller calculuses. It is convenient to 
be able to transform from one such basis to another basis |P,->, |Py>. The 
primed axes will be taken as rotated through angle 0 counterclockwise 
from the unprimed axis. The matrices given below, multiplying a Jones or 
Stokes vector from the left, perform the desired transformation. The same 
matrices also transform a matrix M, in either the Jones or Mueller calculus, from 
one basis to the other through the similarity transformation M' = UMU: 


cosÜ sinð 
—sin ð cos0 
I 0 0 
0  cos20 sin20 


0 —sin 26 cos20 
0 0 0 


-oO ooo 


Within the framework of the Jones calculus, frequent use can be made 
of the basis |R>, |L>. The matrix which transforms a Jones vector from the 
basis |P,>, |P,> to the basis |R>, |L> is given below. Through a similarity 
transformation this matrix also transforms any Jones matrix: 


1 in {4 ent is 1 " | 1 Ei 
E ; ; meee s; 
V2 e't eitl4 V2 —l 1 


252 


IL4 THE PLANE POLARIZER 253 


In the following, Jones matrices will be given in both a |P,>, |P,>- and 
|R>, |L>-basis. These representations will be distinguished by subscripts on 
the matrix, such as My, or Mpz. All Mueller matrices are given in a |P,>, 
|P,>-basis. 


IL2 AN ELLIPTICAL POLARIZER 


The matrices below correspond to an elliptical filter which produces and/or 
transmits unchanged the pure state: 


id id 
|E> = cos 0 e 2 |P > + sin 0 e*? |P >, 
cos? 0 e^ 1? cos 0 sin 0 
e? cos 0 sin 0 sin? 0 XY, 
1/2 1 + sin 20 sing sin 20 cos ¢ + i cos 20 
sin 20 cos $ — i cos 20 | —sin20sinó  |RL, 
1 cos 20 sin20cosó sin 20 sing 
cos 20 cos? 20 isin40cosó + sin 40 sind 
1/2 | sin20cosó  1sin40cosó  sin?20 cos? $ + sin? 26 sin 2¢ |’ 
sin 20 sing  isin40sind 4sin?26sin2¢ sin? 26 sin? $ 


Special cases of these matrices follow. 


IL3 CIRCULAR POLARIZERS 


The matrices for right and left circular polarizing filters are obtained from the 
elliptical filter matrices by setting 6 = +7/2 and 0 = 7/4. In the following 
the plus sign is associated with right circular polarization states and the 
minus sign with left circular polarization states: 


1 Fi 
Tae Lex 
141 0 

gts ele 

100 41 
000 Q0 

IP| o0 0 ol 
4100 1 


I.4 THE PLANE POLARIZER 


The matrices for a plane polarizing filter are obtained from those for the 
general elliptical filter by setting = 0. The angle 0 then corresponds to the 


254 JONES AND MUELLER MATRICES II.6 


angle between the line of the P-state produced by the filter and the x-axis of 


the coordinate system. 
cos? 0 cos 0 sin 0 
cos @ sin 6 sin?0 |XY, 


1 je 128 
142 | et 1 Jue 


1 cos 20 sin 20 
cos 20 cos?26 sin 46 
sin20 isin40 sin? 20 

0 0 0 


1/2 


oococo 


IL5 LINEAR RETARDERS 
In the following the retardance, ô, is given by 
ô = kt(n, — n;). 


The quantities 7, and n; are the indices of refraction for the slow and fast 
P-states which are transmitted unchanged by the plate. The thickness of the 
plate is t. The angle 0 is the angle between the x-axis of the coordinate system 
and the line of the fast P-state. The symbol k is related to the wavelength 
A through k = 27/4. 


cos? (e 9/2 + sin? dei? —2i sin (6/2) cos 0 sin 0 
—2i sin (6/2) cos 0 sin 0 cos? 0e??? + sin? 0e 9"? | XY, 


cos 6/2 sin ó[2 
—sin 6/2 cos 6/2|RL, 


1 0 0 0 
0 cos? 20 + sin? 20 cos — --cos 20 sin 20 --sin 26 sin ô 
—cos 20 sin 26 cos 6 x 
0 +cos 26 sin 20 sin? 20 + cos? 20 cos ô —cos 20 sin ô 
—cos 20 sin 20 cos 6 
0 —sin 20 sin 6 cos 20 sin 6 cos 6 


The special cases of quarter-wave plates and half-wave plates are obtained 
from the above by setting 6 = 7/2 and 6 = m. 


IL6 CIRCULAR RETARDERS 


In the case of circular retarders or optically active materials right and left 
circular polarization states travel with different velocities and possess different 
indices of refraction. The retardance £ is given by 


f = kt(n, — nj). 


II.6 CIRCULAR RETARDERS 255 


A circular retarder or retardance f rotates a P-state through angle 8/2 
away from the x-axis in the counterclockwise direction: 


cos [2  —sin B/2 
Fo 6/2 cos A XY, 


e i812 0 
0 et m RL, 


1 0 0 0 
0 cosf —sinf 0 
0 sinf cosf oO) 
0 0 0 1 


APPENDIX III 


CONVENTIONS FOR POLARIZATION STATES 


Right and left elliptical polarization states are physically distinguished by 
their helicity. At a given instant of time the tip of the electric vector traces 
out either a right- or left-handed helix advancing in the direction of propaga- 
tion. From a quantum mechanical point of view the expectation value of 
the component of, spin angular momentum in the direction of propagation 
is either positive or negative. 

In practice one must decide which helicity shall be called right and which 
left. When these polarization states are then represented by certain mathe- 
matical forms, other choices must be made. It should be apparent that ifa 
set of conventions is adopted for the particular case of right circularly 
polarized light, all other cases are also decided. The four conventions which 
must be decided for a right circular state are: the sign in the time dependence 
of a plane wave, e+! or e~‘**; the helicity; the form of the Jones vector; 
and the sign of the Stokes parameter, Pa. Only two of these decisions may 
be made independently. Thus, there are four convention systems which may 
be used. The system used throughout this text is given in the first column 
of the table opposite, and is equivalent to the 1942 conventions of the 
Institute of Radio Engineers. 


256 


£d oj)ure1ed 
I+ 1— I- I+ Soxojs OY} Jo uis 


[e £^ ‘co £^ [s £^ (=) TA <A | 101294 
I 


Souof ou) Jo WIO 


*g "unjueurour 1e[ngue 


y— y+ y— y+ utds Jo oA[e^ uorejoodxgq 
y 4 4 S Y 
9AvA ouv[d ou; Jo XPH 
aarm sued 
E Goza)? z jo woz jenuouodxg 


uoneztrejod remo J43 Joy suonano 


APPENDIX IV 


THE DIRAC DELTA FUNCTION 


We examine in this appendix how one may interpret and utilize a symbol 
ó(x — Xo), which is defined by 


j * Aoa — xo)dx = f(xo), (AIV.1) 


where f(x) is a continuous function of x. The symbol 6(x — xo) is known 
as the Dirac delta function. 

Consider two functions f(x), as above, and A(x), illustrated in Fig. 
AIV.1. 

The function A(x) is considered to be nearly zero everywhere except 
in some small region around xo. Let us examine the integral 


+o 
[ " JGe)AQ9dx. 


By stating that A(x)is nearly zero outside of the interval xo — a < x < xo + a 
we mean that we may approximate as follows: 


F fl) A(x)dx ~ Í 7 AAA. 
(a) 


. (b) 
Figure AIV.1 
258 


THE DIRAC DELTA FUNCTION 259 


Imagine now that the magnitude of the interval is greatly exaggerated in 
Fig. AIV.1 and that a is so small that f(x) is essentially constant over the 
interval. Then f(x) may be carried through the integral sign: 


+0 To 
| S(x)Ala)dx ~ f (Xo) I A(x)dx. (AIV.2) 
Tf, in addition, 


y A(x)dx = 1, (AIV.3) 


Eq. (AIV.2) becomes 
L ABA ~ flo). (ATV A) 


The difference between Eqs. (AIV.1) and (AIV.4) is, of course, that Eq. 
(AIV.1) implies a strict equality rather than an approximation. 

By a judicious definition of A(x) we can, however, make the two equa- 
tions equivalent. Consider the integral 


[zo] o 
| as | dkf(x)ei*@- 70), 
— 0 — 0 
The integral over k is an indeterminate form and must be interpreted as 


[3 tL 
[ dkei* e - 20 = lim dkeik- 20) 
mé L—o J-L 


sin L(x — xg) 


= 2r lim 
Low "(x — Xo) 


Two graphs of this function for L = L4, La with L, «€ L, are shown in Fig. 
AIV.2. If one determines the area under this curve, it is independent of L 
and is given by 

I Huius (AIV.5) 


-o m(x.— xg) 


By comparing Fig. AIV.2 and AIV.1 and Eqs. (AIV.3) and (AIV.5), it is 
seen that a possible choice for A(x) is 
sin L(x — xg) 
mx — Xo) 


and that d(x — xo) may be realized by 


ie as im SE. (AIV.6) 


Lo "(X — Xo) 


260 THE DIRAC DELTA FUNCTION 


(a) 


t =L,>>>L, 


(b) 
Figure AIV.2 


If Eq. (AIV.6) is interpreted as a limit in the usual sense, no limit exists 
since the sine oscillates. However, when placed within an integral, it yields 
Eq. (AIV.1). Therefore, we may write 


Í gise-zodk = 2rò(x — xo). (AIV.7) 


A number of other useful representations of ó(x — xo) may be obtained, 
examples of which are given without proof. 
sin? c(x — xo) 


ó(x — = lim —————z 
cm ie en(x — xo)? 


" [4 2 
ó(x — xo) = lim /-e €. 
co T 


A commonly occurring integral is 


Í dke*'- 79, (AIV.8) 


THE DIRAC DELTA FUNCTION 261 


The integration be taken over all space where 
k = k,e, + k,e, + kez 
and 
r — rg = (x — Xo)er + (y — yge, + (Z — zoee.- 
Writing out Eq. (AIV.8) in full, we have 


oo oo o0 
Í dk es G- 20 | dk e*s% Í dk es 20, 
—o 


— 0 — o5 


which from Eq. (AIV.7) may be interpreted as 
| dke*-*9 = amyó(x — xy)(y — yz — zo) 


= (2n)°S(r — rg. 


Equation (AIV.7) may be used to give a proof for the inversion formula for a 
Fourier transform. Consider the integral 


1 co 
== ikea AIV.9 
Vn in , ge" dk, ( ) 
where g(k) is given by 
1 f° , 
k) = — Í x^e7 *** dx'. (AIV.10) 
a) = | fe 
Placing the form of Eq. (AIV.10) in Eq. (AIV.9) yields 
L M dk E dx'f(x')eik«- 7), 
2m J- œ — o 


Interchanging the order of integration and recognizing 
Í dkei*«-:? = d(x — x’), 


we obtain 
1 f 
EX i Jri — x')dx! = fd. 
2m J_ 

Thus, if g(k) is defined by 


g(k) = = F fG9e7 "dx, 


262 THE DIRAC DELTA FUNCTION 
f(x) is given by 
1 oo 
fe) = Í K)etdk. 
V2n J-» sh) 
The following useful properties of the delta function are given without proof 


à (AIV.11) 


t=a 


wood d 
[ F() 7. BE- ddt = — 7 FO 


f FSIE — aldt = go (AIV.12) 


APPENDIX V 


USEFUL VECTOR IDENTITIES 


In the following, ¢ is taken as a scalar function of position. Vector functions 
of position are represented by A, B, C, and D. The operator V expressed 
in Cartesian coordinates is given by 


[2 4 ô [2 
ĉr By ey tez 


Symbols such as (B - V)A are to be interpreted as vectors whose i-th com- 
ponent is 


1. A-(Bx C) 2 C-(A x B) = B-(C x A) 


2. AX (B x C) = B(A- C) — C(A- B) 

3. (A x B)- (€ x D) = (A - CB. D) — (A* D(B- C) 

4. V x (Vj) = 0 

5. V-(Vx 4) =0 

6. V-($4) = Vó- A--àV-A 

7. V X (64) - Vb X A--0V X A 

8. V(4- B) 2 (A- VB -- (B- VA - A X (V X B) + B x (V X A) 
9. V- (4 x B) — B-(V X A) — A-(V x B) 

10. V X (4 x B) = A(V- B) — B(V - A) + (B: V)A — (A- V)B 


= 


. V x (V x A4) 2 V(V- 4) — (V- V)A 


In the following, integrals including the differential symbol dV are to be 
taken over a finite volume of space. Such volumes are bounded by a closed 
surface whose differential area is represented by dA. The differential of an 
open area is written dS and the directed segment of the boundary of such 

263 


264 USEFUL VECTOR IDENTITIES 


an area is written dl. The symbol n represents a unit vector normal to a 
surface, directed outward from a closed surface, and dl X n directed away 
from an open surface. 


12. faso x B). n= $s. dl 
13. favo x B)— [ase x B) 


14. favv-B= [aspen 


APPENDIX VI 


MOMENTUM SPACE REPRESENTATIONS OF 
LINEAR MOMENTUM AND ANGULAR 


MOMENTUM 
VL1 LINEAR MOMENTUM 
By Eq. (7.25) 
S 1l 
g= = En) x HO), 
1 
PHa IL E(r) x H(r), 
which may be rewritten, by use of Eq. (8.7), as 
1 
= d n oi(k o k')r 
P Ones ME dk' dr E(k) x H (k') e 


1 
-3 | I dk dk’ E(k) x J(k')(k + k^) 


1 
-3 IE E(k) x 2€ (—k). 


By Eq. (8.9), 
ik x H (k) = &, 
ik x [k x H (k)] = ck x E(h, 
i[k(k - 2€) — H (k - k)] = ek x È. 
Since k -H = 0, by Eq. (8.11) we obtain 


(AVI.1) 


266 MOMENTUM SPACE REPRESENTATIONS VI.2 


so that 


1 ek X &(—k) 


Now, 
E x [kx ] = kK(&-&4) — Êk- £). 
But k - & = 0, by Eq. (8.10), so 
e (dk ; 
By Eqs. (8.19), (8.20), and (8.27), 
pid [anti fo +f oft ff + fee) 
where the following definitions of symbols have been used: 


fi =fG¢h, f -f(-h, etc. 


Remembering that 


IEZI LIMES 


and that FE. = L = hase it is possible by a simple change of variable, 


replacing k by —k, to show that the first two integrals vanish and that the 
last two are equal. We thus obtain 


p= | ake hk fW SO 
or, for the « component of linear momentum, 
pa = [rho - tis ron 
which is in agreement with Eq. (8.29). 
VI.2 ANGULAR MOMENTUM 
By Eq. (9.2) 


Jj P fær x [E(r) x H(r)], 


VI.2 ANGULAR MOMENTUM 


which may be rewritten, by use of Eq. (8.7), as 
1 


= Quy? 


| | | dk dk’ dr [re ^* )*] x [6(K) x 2€(Kk)]. 
Now, 
IE ret Fr — — iV g fe gcn 


= —iVió(k + k’)(27)°, 


so we may write 


J=- Rd | [ dk dk’ [Vgo(k + k’)] x [&(K) x Hk]. 
[3 


267 


Treating dk'V (Kk + K') = du as a perfect differential and integrating by 


parts over K', we obtain 
J= = | [a dk’ 0(k + k')Vx- X [6(K) x A(K’)). 
c 


By Eq. (AVI.1) 


ek x &(k) 
KE ——- , 
® i ke 
so that 
» ; : i k x &(k’) 
J=— ffa dk' (k + K)V X (suo x CTS 
We expand the triple cross product, using 
A X (B x C) = B(A- C) — C(A- B), 
and obtain 
; , „ $00 ; SK) 
Ve X (sw x |^ x p | = Vyp X lr L |} 
E(k 
— Ve x | e 
Each bracket in this expression is of the form 
V($A) = Và X A+ 9V X A. 
The first term yields 
ery ét 
Vy [ew j M ? xk + (sw: i ? (V, X k’), 


HOE xI}: 


268 MOMENTUM SPACE REPRESENTATIONS VI.2 


but 
Ve X k =0. 
The second term becomes 
aV 3 "n Èk 
—Vx- [G(K) - k'] x i — (Elk) -K)Vg X Lap 


and, by using the identity 
V(A- B) = (A- V)B + (B- V)A - AX (V x B) - Bx (V x A), 
we obtain 


Ve: [600 - K&'] = (E(k) Vie + (€ - VE) 
+ Elk) x (Vx X K) +k X (Vie X &(K)). 


The last three terms vanish identically and if the first term is written out 
explicitly, we obtain 


VielE(k) - K'] = (E(k) + Vik = E(k). 


Collecting nonvanishing terms, we obtain 


Ve X [sw x L x ZS = Ve |s% 2a x k' — E(k) x a6 


k’2 ka 
— (EE) Ve x T 
We then have 
EE. f nto. Tem. EE) , é(k) 
fe IL ó(k + k’) [v [ew a ] x K — 6(k) X —7; 
— (&(k)-k’) (Yx a IE 


If we now integrate over K', the last term vanishes, since &(k) - k = 0. 
Some care must be taken with the first term, since Vy, does not operate 
on &(k) either before or after the integration over the ó-function. Thus, 
&(k) is treated as a constant in respect to Vg. We indicate this by writing 
&(k). After the integration, we obtain 
(K) 


m ] x k — E(k) x SEM. 


k2 


de 5 IE [va [ew 


VI.2 ANGULAR MOMENTUM 269 
Now, using Eqs. (8.19), (8.20), and (8.27), we find 


- 2 [acus 450-07 x k - d EP LO) 


ih 
E IIIS WE S SE S UB SIX 


—[f* x f- —f* x f** x f*- x f- —f*- x f*!]. 


By a simple change of variable, replacing k by —k, one shows that 


Jer xf- = |a x ft 20 


fas txf* 
so that the last four terms yield 


—ih | dk f*(k) x fk). 


- jnre xr. 


Before proceeding, it is fruitful to examine integrals of the form 
j^ kx VF), 


where F is any scalar function of k, F (k). We wish to show that these in- 
tegrals vanish. Using the identity 


V x (64) = Và X A+4OVXA 
and Vx X k = 0, we obtain 


face x VAF) = IL Va X (KF), 
and by using the identity 


for x mar = faatnx m, 


we obtain 
IE V. X (kF) = IE E x kF); 


where the surface of integration is taken over a sphere in K-space whose 
radius is allowed to become infinite so that the volume integral fdk may be 
taken over the whole of K-space. Since k X k = 0, the integral vanishes. 
As an example, consider 


IE k x Vif) - f —&)). 


270 MOMENTUM SPACE REPRESENTATIONS VI.2 


Our result does not apply directly because of the presence of f; within the 
gradient. However, by applying 


V(A* B) - (A- B -(B- V)A-- AX (V X B) - BX (V x A) 
to 
VQ fy VSS), and Vif f) 
one obtains 
Vit I) = Vif fo) Vi fe) 
and thus 


face x VES S) = face x Vk(f*-f-)— face x Vk(f* «f ). 


But we have shown that the first integral on the right vanishes. If we make 
the substitution of —k for k on the left, we obtain 


ILL x Vf fo) = face x V(f* fr). 
By comparing these results, we must have 
face x Visi -f-) =0. 
By a similar argument we can show that 
face x V f> + f**) = 0. 
However, the proof fails for the integrals 
ILL X Vre(fè -f**) and SL x Vk(f*- - f^), 
but does show that these integrals are both equal to 
- [aei x V«(f** f>). 
Finally, then, we obtain 
J= | dk (—ihk x. Ve SE SO) + | dk(—ihf*(k) x f(&), 
and taking the « component we obtain 
I= fak fli x Vela + [tci x na. 


which is that given by Eq. (9.51). 


APPENDIX VII 


INTEGRALS OF THE GAUSSIAN DISTRIBUTION 


The following definite integrals are given without derivation: 


+o " Lt 
| e dx = /-> (AVIT.1) 
-o a 


| o aps aues. Z, (AVII.2) 
a 2ay/ a 
+a š 3 fr 

| xte dy ——— JZ, (AVIL3) 
Co 4a? Aj a 

| xe e? dx = 0 n = odd integer. (AVIL4) 


VIL1 THE GAUSSIAN DISTRIBUTION 
The Gaussian distribution f(x) for the random variable x is given by 


1 e7220? 
oV 2r 


fe = 


Using Eqs. (AVII.1) and (AVII.2), we easily show that 


+o 


(x) dx = 1, 


F i xfx) dx = 0, 


where c is known as the standard deviation of the distribution and o? as 
the variance. 
27 


272 INTEGRALS OF THE GAUSSIAN DISTRIBUTION VII.2 


VIL2 THE BIVARIANT GAUSSIAN DISTRIBUTION 
The bivariant Gaussian distribution f(x, y, p) is given by 


l all) + C) EI 
x, y, p) = —————expi—-zil-i-c-i-]-—-—ir 
rad 2n V1 — p? a,0, P| 2 LG Oy 0,0, 
By completing the square of x in the exponent, we may cast this into the 
form 


fŒ y, p) 
rs mam sr 36] 
-z——————expi—É——áà&i—--»e-—|iep|-zl-)r 
mai ra "| rl ^w] ^L 26 
(AVIL.5) 
In this form it is easily seen from Eq. (AVIT.1) that 


[feo be fo». 
Likewise, one may show 
[ferme = on 
so that the bivariant distribution represents a Gaussian distribution for each 


of the variables x and y. 
The correlation of the variables x and y is obtained by 


xy = i f xyf(x, y, p) dx dy 
- = 2 Eid = 2 
=|" I 9 (= p=) xfs. yp) dedy 


(el oo 2 
+ I Í op ? fts, y, p) dx dy. 
—o j—o y 


By use of the form of Eqs. (AVIL5) and (AVII.4), the integration over x 
in the first term is zero and the second term may be obtained as 


Xy = po;0, = p xy’, 
which identifies p as the correlation of the variables, 
p = xylV Np). 


The correlation between x? and y? is given through 


xy = F l2 x*y*f(x, y, p) dx dy. 


VII.2 THE BIVARIANT GAUSSIAN DISTRIBUTION 273 


Again completing the square and making the substitutions 


M de mes 
et Pus Oy 
we obtain 
-— 020; oo co 
pe . NEN Í dW dz (2? + 22Wp + p?W2)W? 
á n. E iin 


L xm Ww? 
exp (3i JL. (- 7 : 


Integrating over z and using Eqs. (AVII.1) and (AVIL2), we obtain 


22 w? 
070. E 
252 — ey 2 
x > 


amet =a ae 


and the final integration yields 


AWIL — p? + p?W2]W?2 e- 


x?y? = o?o?[l + 2p7], 
or 


x?y?/(x*)(y?) = 1 + 2. 


BIBLIOGRAPHY 


GEOMETRIC OPTICS 
General 


Jenkins, F. S., and H. E. Wurrk, Fundamentals of Optics, McGraw-Hill (1957). 
Sears, F. W., Optics, Addison-Wesley (1949). 


Matrix Methods 


Brouwer, W., Matrix Methods in Optical Instrument Design, Benjamin (1963). 


HALBACH, K., “Matrix Representation of Gaussian Optics," Am. J. Phys. 32, 90 
(1964). 
O’NELL, E. L., Introduction to Statistical Optics, Addison-Wesley (1963). 


POLARIZATION STATES 


American Institute of Physics, Polarized Light: Selected Reprints (1963). 

Dirac, P. A. M., The Principles of Quantum Mechanics, Chap. 1, Oxford University 
Press (1958). 

McMaster, W. H., “Matrix Representation of Polarization,” Rev. Mod. Phys. 
33, 1 (1961). 

O'NEILL, E. L., Introduction to Statistical Optics, Addison-Wesley (1963). 

ScHuRCLIEF, W. A., Polarized Light, Harvard (1962). 

ScuurcurF, W. A., and S. S. BALLARD, Polarized Light, Van Nostrand (1964). 

ScuurcutrF, W. A., “Resource Letter PL-1 on Polarized Light," Am. J. Phys. 30, 
227 (1962). 


QUANTUM MECHANICS 


American Institute of Physics, Quantum and Statistical Aspects of Light: Selected 
Reprints (1963). 
274 


BIBLIOGRAPHY 275 


AKHIEZER, A. I., and Berestetsky, V. B., Quantum Electrodynamics, Part One, 
Chap. One, Interscience (1965). Available in paperback from Office of 
Technical Services, Dept. of Commerce, Wash., D.C., as AEC tr 2876. 


CARRUTHERS, P., "Resource Letter QSL-1 on Quantum and Statistical Aspects of 
Light," Am. J. Phys. 31, 5321 (1963). 


FEYNMAN, R. P., The Feynman Lectures on Physics, Vol. III, Addison-Wesley 
(1965). 


IKENBERRY, E., Quantum Mechanics, Oxford University Press (1962). 
KAEMPFEER, F. A., Concepts In Quantum Mechanics, Academic Press (1965). 


McMaster, W. H. “Matrix Representation of Polarization,” Rev. Mod. Phys. 
33, 1 (1961). 


Mannı, F., Quantum Mechanics, 2nd Edn, Academic Press (1957). 
O'NEILL, E. L., Introduction to Statistical Optics, Addison-Wesley (1963). 


ELECTROMAGNETIC FIELD AND COHERENCE 

Born, M., and E. Worr, Principles of Optics, 2nd Edn. [the bible of non-quantum 
optics], Pergamon Press (1964). 

FRANGON, M., Modern Applications of Physical Optics, Interscience (1963). 

Jenkins, F. S., and H. E. Waite, Fundamentals of Optics, McGraw-Hill (1957). 


MaNDEL, L., “Fluctuations of Light Beams,” Progress in Optics, Vol. III, E. 
Wolf (Ed.), Interscience (1963). 


O’NEILL, E. L., Introduction to Statistical Optics, Addison-Wesley (1963). 


Reitz, J. R., and F. J. MILForD, Foundations of Electromagnetic Theory, 2nd Edn., 
Addison-Welsey (1967). 


ScuurcutFF, W. A., Polarized Light, Harvard (1962). 

Stone, J. M., Radiation and Optics, McGraw-Hill (1963). 

STRATTON, J. A., Electromagnetic Theory, McGraw-Hill (1941). 

Young, H., Fundamentals of Optics and Modern Physics, McGraw-Hill (1968). 


INDEX 


Abbe's theory, 161 dichroic polarizers, 69 ff 
aberrations, 21 dichroism, 43 
airy disc, 158 diffraction grating, 154 ff 
angular momentum 

orbital, 37, 235 ff e-ray, 42 

photon, 237 ff eigenvalue, 97 

plane wave, 222 ff eigenfunction, 97 

spin, 37, 238 ff eigenvector, 97 
angstrom, 32 electron, 208 
anti-Hermitian matrix, 93, 133 energy density, 173 
axis system, optic, 1, 42 expectation value, 104 
Babinet's principle, 167 Fermat's principle, 46 
basis, 50 focal plane, point, 9 f 
birefringent materials, 42 focal length, 15 
bra, 48 Fourier transform, 121, 261 
bracket product, 48 Fraunhofer diffraction, 149 ff 
box normalization, 221 Fresnel diffraction, 141 ff 

Fresnel integrals, 144 

cardinal points, 19 Fresnel's equations, 184 ff 
central limit theorem, 202 
coherence Gaussian distribution, 201, 271 

complex degree, 191 bivariant, 202, 272 

functions, 191 Gaussian optics, 2 

length, 195 Green's theorem, 135 

time, 195 
constant of motion, 109 Hanbury-Brown, Twiss experiment, 203 
contrast, 166 Heisenberg uncertainty principle, 126 
conjugate points, planes, 12 Hermitian operator, 99 
correlation matrix, 199 
curvature, 5 idempotent, 75 

index of refraction, 4 

delta function, 258 ff relative, 4 


277 


278 INDEX 


interferometer o-ray, 42 
Mach-Zehnder, 30 object 
Michelson stellar, 197 amplitude, 164 
image point, plane, 11 phase, 164 
image, virtual, 14 planes, points, 12 
obliquity factor, 140 
Jones calculus, 53 operator 
Jones vector, 53, 181, 199 adjoint, 119 
angular momentum, 231 ff 
ket, 35 Hn 117 
Kirchoff-Fresnel formula, 140 Hermitian, 99, 119 


linear momentum, 115 ff 
projection, 74, 133 
time, 107 ff, 111 ff 
translation, 110 
optical activity, 44, 68 ff 
optical path length, 31, 46 
orthogonal kets, 48 


lens 
Gaussian equation, 16 
Newtonian equation, 17 
thick, 6 
thin, 12 

Lorentz force, 170 


Malus’ law, 29, 178 orthonormal kets, 48 
magnification . 

angular, 11 paraxial approximation, 2 

lateral, 12 phase contrast, 166 
matrix photon, 26 

algebra, 245 ff planes of unit magnification, 14 

correlation, 199 plane waves, 176 ff 

N, 18 Poincaré sphere, 82 ff 

P, 15 Poincaré vector, 82 

Pauli, 76 polarizing angle, 205 

reflection, 5 polarization 

refraction, 4 degree of, 89 

retarder, 60 ff filter, 28 

sigma, 76, 87 f, 99 partial, 89 

system, 8 sign conventions, 256 f 

transformation, 67, 252 states, standard form, 54 

translation, 3 postulates of quantum mechanics, 107 

types of, 247 Poynting vector, 173, 205 
Maxwell’s equations, 170, 212 ff principal planes, points, 13 
Maxwell’s tensor, 175 principal transmittances, 44 
mirror equation, 17 probability amplitude, 51 
momentum space, 210 pseudomonochromatic case, 189 


momentum density, 174 
monochromatic, 20 quasimonochromatic case, 189 


Mueller calculus, 79 f quantum number, 232 
quaternions, 94 
nodal points, planes, 16 f 
normalization, 53 Rayleigh's criterion, 158, 163 
box, 221 reality condition, 188, 213 


reflection law, 184 

resolver, 52 

retarders, anisotropic, 40, 254 
retardance, 60 

retardation, 32 


Schrodinger equation, 209, 219 
Schwarz inequality, 191 
Snell’s law, 184 
standard deviation, 202 
states 
mixed, 88 
nonpure, 88 
orthogonal, 34 
photon, 240, 241 
polarization, 29 ff 
pure, 30 
stationary, 111 
Stokes parameters, 77, 199 


ABCDE79876543210 


INDEX 279 


superposition, 31 
coherent, 32 


telescopic system, 11 
trace of matrix, 247 
transversality condition, 215, 240 


uncertainty relations, 123 ff, 127 ff, 
131 ff 
unpolarized light, 28, 89 


Van Cittert-Zernike theorem, 197 
vertex, 1 
visibility, 193 


wave equation, 171 
wave number, 59 
wave plates, 60 
wavelength, 32 


